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ABSTRACT 

Methods are presented (1) to partition or deconqpose a visual 
scene into the bodies forcing it} (2) to position these bodies in 
three*- dimensional space, by combining two scenes that make a 
stereoscopic palrj (3) to find the regions or zones of a visual 
scene that belong to its background} (4) to carry out the isolation 
of objects in (I) when the Ivfpat has inaccuracies. unnnirtQ cooqputer 
programs inclement the methods, and many examples illustrate their 
behavior. The input is a two-dimensional llae-dzawing of the scene, 
assisted to contain three-dimensional bodies possessing flat faces 
(polyhedra)} some of them may be partially occluded. Suggestions 
are made for extending the work to curved objects. Some co^arisons 
are made with human visual perception. 

The main conclusion is that it is possible to separate a picture 
or scene into the constituent objects exclusively on the basis of 
monocular geometric properties (on the basis of pure form); in fact, 
successful methods are shown. 

Thesis Supervisor: Marvin L. Minsky. 
Title: Professor of Electrical Engineering. 
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1£ the machine is asked to separate the bodies, it must say 

(BODIES ARE AS FOLLOWS : (18 9) (2 7) (3 5 6) (10 15) 

(4 13 14) ) 

If asked to report the triangular prisms. It should answer 

(10 15 IS A IRIAM6DLAR PRISM) 

~ This thesis discusses the problems involved in this task. 

What should be done when the information is noisy, some lines 
are missing, etc? 

Bow can the con^uter separate the background from the objects 
forming the scene? 

^w should shadows be handled? 

How can stereoscopic vision b6 used? 

What about ambiguities and optical illusions? 

"- This thesis also discusses some related aspects of human 
visual perception 

■« Key words and phrases related to this study are as follows : 



artificial Intelligence 

body 

blickgtditod 

bj^ckground discriminatloii 

classification of images 

CONVERT 

cybernetics 

feature recognition 

geometric objects 

geometric processing 

graphic processing 

graphical coasmmication 

graphical data 

heuristic procedures 

heuristic programming 

identification 

image 

intelligence 

line drawing 

LISP 

list processing 

machine aided cognition 

machine pereeptlflu 

meohanlsatien of visnal 

perception 
object identification 
optical 

optical illusion 
pattern 



pattern sMtehlag 
pattacB Mse^Bltioa 

ph<rtio-;! faas> r pr etation 

pietnre 

pletiire abstxaetion 

ptctore pxocessiafc 

picture tcaasieauitlons 

pictorial MrxncturiM 
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Computer Review (A. C. M.) Index numbers: C,R. 3.61, 3.63, 
4.22, 5.20. 



Why this work was chosen as a thesis topic „. 

The present work was 

carried out using the facilities of the Artificial Intelligence Group 

of Project MAC, at M, I. T. Currently, the main goal of the 

Artificial Intelligence Group (AI group) is «to extend the way 

computers can interact with the real world: specifically to develop 

better sensory and motor equipment, and programs to control them.^> 

{Minsky, Status Report II}. From such efforts, a robot or mechanical 

manipulator has been constructed, consisting of a PDP-6 computer, 

an image dissector camera mechanical arm and hand (see pictures). 




IMAGE DISSECTOR CAMERA 



<«:These "eyes and hands" are eventually to be able to do reasonably 
intelligent things but first, of course, it is difficult enough to 
get them to do things that are easy for people to do.»{Ibld.} 



An image dissector 
silently watches 
a triangular prism 
in the vision lab£ 
ratory of the A.I. 
Group . 
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The work was naturally divided Into visual information processing 
(computer vision) and manipulation and control of the arm-hand . 
Thus, when I came as a graduate student from the Politecnico de Mexico 
to M. I. T. (Sept. 65) and became associated with the AI Group, I 
found a great interest there in graphical communication with computes. 
Moreover, it was felt that symbol manipulation techniques would be 
relevant to this area. 1 was fortunate enough to have had some con- 
tact with the LISP language in some of its implementations: 
MB - LISP {Mcintosh 1963) * and Hawkinson-Yates- LISP {Hawkinson 64}* 
at the Centro Nacional de Calculo of the Politecnico; in fact, I 
became interested in the area because I felt that it would be possible 
to handle two-dimensional structures much in the same fashion as one 
handles lists (that is, one-dimensional structures or strings of 
symbols) in a pattern-driven language, such as CONVERT {1965}, recently 
finished at that time. 

The area also offered a good opportunity to understand and 
evaluate several techniques, computers, equipment, etc. Consequently 
I decided to work in it. 



^*^ The parentheses { } always Indicate a reference to the 

bibliography at the end of this thesis, where the complete title, 
date, etc., of the paper can be found. 
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SIMPLIFIED VIEW OF SCENE AHALYSIS 



TO THE BUST READER 



This aection presents « general view of the problems 
In the thesis and their solutions; if you are short of time, 

(1) Read the abstract ta^ this section. 

(2) Choose some scenes from section 'Analysis of many scenes', 
and observe how the coo^uter perceives them. 

(3) Look through the table of contents, select additional topics. 



Scene Analysis 

■ III T ; ■ Scene analysis Is the result of interaction between 

optical data coming from the Eye, and kqowl edgg about the visual world 

stored In the programs. In all that follows, the optical data entering 

through the Eye is reduced to a line drawing; this pass is called 

pre-processing , and It will be only briefly sketched here. 

After preprocessing, such a 



The stylised presentation that 
follows is only an example; in 
partlculsr, scene analysis does 
not need to follow the sequence 
pre-processing >i^ recognition. 

See 'Division of work in 
Coq>uter Vision' in page (o . 



line drawing is analyzed in order 
to discover and recognize given 
objects in it. The process is 
called recognition . 

This thesis is concerned 
with recognition. 

We now give a sinpllfled exposition of both processes. Recognition 
will be discussed abundantly in the remainder of this thesis, since 
it is the main topic; readers who wish for more information on pre- 
processing or other approaches should consult the references, for 
instance {my MS Thesis} and {A C Shaw FJCC 68}. See also page 60 . 
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Each inhomogeneous square [jj is divided in four pR , ignori 
again the homogeneous sub-squares. 



ng 




The process is repeated a few times more. 
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The squares are now reduced to lines and vertices. 
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The resulting analysis gives us the first chance to start 
working abstractly now. Instead of continuing in "picture-point 
space." Preprocessing is finished. 




Recognition 



This and the next page 
describe proposed, but still 
tinflnished, parts of the 
system. 



What follows is merely a brief summary of the processes in 
recognition. A more systematic presentation and classification of 
processes in recognition is found in 'Division of work in Computer 
Vision' , on page 60. 

A program would check in the original scene, on both sides of 
each line, for continuation across the line, of textures, local cracks, 
etc. On these and other grounds, shadows wo\ald be picked up and 
erased: 
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A llne-prQpo»er progzam studies the abstract or "symbolic" scene and, 
using some heuristics and general principles, proposes places where 
It Is quite probable that a line Is missing: 




These places are searched by a llne-verlfylni^ program, which Is an 
specially sensitive test that uses fine measurements from the ori- 
ginal scene, and often It will pick up a boundary that was missed 
In the less-lntelllgent homogeneity phase. Here It can be practical 
to apply a very strict and sensitive test, because the program 
knows very accurately where the line should be. If It really exists 
at all. For example, even If the two faces have almost equal Illu- 
mination the Eye can pick up a thin, faint highlight from the edge 
of the cube. It would have been hopelessly eiq>enslve to look for 
such detailed phenomena over the whole picture at the start. 
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At this stage our parogram SEE (page 58) comes 
into action. This program treats different kinds of local 
configurations as providing different degrees of evidence 
for 'linking' the faces. This evidence Is obtained mainly 
at vertices, and at boundaries between regions. 

A vertex is in general a point of intersection of 
two or more boundaries of regions. These regions might or 
might not be faces of a single body. SEE examines the 
configuration of lines meeting at the vertex to obtain 
evidence relevant to whether the regions Involved belong 
to some object. 

For Instance, in the vertex configurations "ARROW" and 
"FOEK"(a complete classification of vertices can be found 
below in table 'VERTICES'), 

b 

a 





"FORK" 



"ARROW" 
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the "foRK" suggests linking face a to face b, b to c, c to a. 
The "ARROW" links a with b. A "leg" (which depends on nearly 
parallel lines) would add a weak link, in addition to the ordinary 





'LEG' 
(Weak link shown dotted) 



Matching T's. 
(two strong links) 



i&r strong) link placed by Its 'arrow'; a "T" looks for a matching 
"T", and if found, two strong links are placed as shown. Also, a 
"T" counts against (inhibiting, that is) linking a with c, or 
b with c. ^i^'* 

These links, for our example, are 




and may be represented as 




[weak links are dotted] 
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indicating two groups of linked faces, that is, two bodies: 



(BODY 1. IS 1 2 4) 
(BOOT 2. IS 3 5 6) 

If In addition we give at this point to 
the computer the definition or concept 
of a ' triangular prism' , through an ab- 
stract model of it {my MS Thesis}, we 
can get 

(12 4 IS A TRIAHGULAR-PRISM) 
(3 5 6 IS A CUBE) 



Rec^ognlt^o^ has finished. 



Analysis of several examples 

A larger variety of kinds of evidence is used in more complicated 
scenes, making the program more intelligent in its answers: 

(1) The links themselves are inhibited by conditions or configurations 
at the neighbor vertices and faces; for Instance, in tde case 
of a "FORK", the (strong) links indicated below are inhibited ; 



(2) 



(3) 






The links to the background are ignored [conqplete descriptions 
of conditions for producing and cancelling links are to be 
found in section 'SEE, a program that finds bodies in a scene']. 

A hierarchical scheme is used that first finds subsets of faces 
that are very tightly linked (e. g., by two or more links). 
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These "tuiclel" then compete for more loosely linked faces 
(faces linked through one weak link and one strong link ^ x> * 

or one face conpletely unlinked, except by one strong link O)* 

By not considering a single link, waak or strong, as enough 
evidence for assigning two faces as part of the same object, this 
algorithm requires two "mistakes'* (that la, two careless place- 
ments of links between regions that should not be considered as 
forming the same body) to make an Identification error. 



The bodies of the following scenes are found by SEE without 
difficulty. 




Hote that of the strong links available to the "FORK" marked with 
an arrow, two were prohibited or inhibited and only one is produced 
by SEE. 
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Dotted links are weak. 
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In the following figure, the "FORK" of the btg object is missing. 





Statement of Rules ^^ ^^^^ re-state the rules under (3) o£ page -24. 

Region (definition). Surface bounded by simply closed curves. 

We will consider the outer background (:16 in fig 'LIO' , page 59 )' 

to be also a region. 

Nucleus (definition). A nucleus (of a body) is a set of regions. 

Linked nuclei (definition). Two nuclei a and B are linked if 

regions a and b are linked where a € A and b e B. 

First rule ; If two nuclei are linked by two or more strong links, 

they are merged Into a larger nucleus. 
For instsnc , regions :8 and :U are put together, because there 

exist two strong links among thon, to form the nucleus :8-ll. 

Maximal n uclei ; Starting from nuclei containing individual regions, 
we let tho nuclei grow and merg^: under the First rule, until no new 
nuclei can be formpd. When this is the case, the scene has been 
partitioned into several "maximal" nuclei; between any two of these 
there is at most one strong link. 

For instance, regions :8 and :11 are put together by the First 
rule; now we see that region :4 has two links with nucleus :8-H, 
and therefore the new nucleus : 8-11-14 is formed. This last is a 
maximal nucleus. 





= For the moment, ignore the colons (:) in front of numbers. The 
name of a region is a number preceded by a colon, such as: 16. 
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The Plrst rule la applied again and again, until all nuclei are 
maximal nticlel; then the following rule la applied; 

Second Kule ; If nuclei A and B are Joined by a atrong and a weak 
link they are merged into a new nucleus. 




The Third rule la applied after the Second rule. 
Third Rule ; If nucleus A consists of a single region, has one link 
with nucleus B and no links with any other nucleus .^ A and B are merged. 

(10 II) does not Join the bigger nucleus because (10 11) does not 
consist of a single region. Below, 9 does not Join (7 8) or (4 5) 
because 9 haa two links: /tN iA 




10 

J I 




The Third rule tends to avoid proposing bodies consisting of a 
single region. 



The next exanple shows how three "false" links failed to lead 
SEE Into error: 






Here three links were erroneoiisly placed bet SEE did not get 
confused by thea. 

In conpllcated scenes, coincidences cause two objects to line up. 
As a result, vertices of different objects are merged, two objectively 
different lines appear as one and so on. The naxt example Illustrates 
these phenomena and shows how SEE copes with the problem. 
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SEE trans fonnfl the above scene as follows: 
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As we see, the nuclei are going to be correctly formed, and SEE will 
also analyze this scene correctly, 

"Ihe bodies do not need to be rectangular, prismatic, convex. They 
only need to be rectilinear. As we will aae later, even curved objects 
may be identified, under certain restrictions (cf. Table 'ASSUMPTIONS'). 




Figure 'BRIDGE' 
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All the bodies in "BRIDGE" are adequately found. A new heuristic is 
used here: 




three parallel lines comprising'r'^'gions that are not background, and 
having the background as a neighbor, and a 'T' in the center line, 
originate a strong link, as shown above. 

The following locally ambiguous scene is correctly parsed by 
our program: 





If we add another block to the right, the program makes a mistake and 
falls to see one of the inner cubes: 





Figure 'MOMO' also gets decomposed accurately: 




Figure 'MOMO' 
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The local links allow correct Identification of the following body 3 





o 



If the lateral faces do not have parallel edges, a mistake occurs 
(conservative behavior, page 2>Z): 






Another mistake occurs in the following scene: 



&> 



W^ 




At left, the above mistake Is not produced 
because vertex A links :2 and :8, by 
the new heuristic Introduced in 'BRIDGE*. 



Conclusion 

The performance of this program shows that It is possible to 
separate a scene into the objects forming it, without needing to know 
the objects in detail; SEE does not need to know the 
'definitions' or descriptions of a pyramid, or a pentagonal prism, 
in order to Isolate these objects in a scene containing them, even in 
the case where they are partially occluded. 

The program will be fully analyzed in the following pages. 
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* 
Problema in analyzing a vlstial scene 

The problem of taking a two-dimensional Image (or several such 
linages), and constructing from It a three-dimensional Interpretation, 
Involves many operations that have never been studied, to say nothing 
of being realized on a computer. We will list some of these here; 
a more complete list Is fovind In my M.S. Thesis {MAC TR 37}; some 
have been side-stepped or Ignored by the present recognition system; 
the problems which we did solve are discussed in the text. 

Among the facilities that must be available are: 

a) Spatial frame-of-reference ; setting up a model of the relation 
between the eye(8) and the general framework of the physical task, 
1. e., \Aere are the background, the "table" or working surface, 
and the mechanical hand(s)7 

b) finding visual objects , and localizing them In space with respect 
to the eye-table-backgrotind-hand model. 

c) Recognizing or describing the objects seen , regardless of their 
position, accounting for partly-hidden objects, recognizing objects 
already "known" by descriptions In memory and representing the 
three-dimensional form of new objects. 

d) Building an Internal "stmctural model" of what has been seen, 
for the purpose of task-goal analysis. 

Among the liiq>ortant factors are the effects of: 

1. Both the camera's focus and Its depth-of-f ocus . 

2. Illumination of the objects . Light affects the appearance of 
objects In obvious and subtle ways -- In scenes with multiple 
objects and lights we get complicated shadows, which have to 
be detected or rejected. The boundary between two faces may 
disappear If they get equal Illumination from a diffuse light source. 

3. Perspective and distance effects. Even for geometric objects with 
flat surfaces, the two-dimensional projection of their surface 



* Adapted from Stattis Report II {Mlnsky 67}. See also Project MAC 
Progress Report {1967, 1968}. 
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features can take many forms, and the system has to be able to deal 
with all of them. It works both ways, of course: once Identified, 
the appearance can give valuable information about the object's 
orientation, size, and even (under some conditions) its absolute 
spatial locations {Roberts 1963}. 

4. Accidental vs. essential visual features . Two objects of the same 
shape and location can have very different visual presentations 
because of their surface textures and markings. We need to 
distinguish these two-dimensional "decorations" from real three- 
dimensional spatial features. 

Other projects 



Here are the main robot groups 



at a panel discussion. 

1968 





Chairman: 

DR. BERTRAM RAPHAEL 

Stanford Research Institute 
Menio Parif, California 

problems in the 
implementation of 
intelligent robots 

This session, the second of three sessions on robotry, will 
consist of a panel discussion among technical people in- 
volved in the design and construction of mechanical de- 
vices that are capable of significant independent "intelli- 
genf behavior, usually by means of'computer control. The 
projects represented on this panel have drawn upon state- 
of-the-art capabilities in many technologies including 
mechanical engineering, pattern recognition, heuristic pro- 
gramming, neural networks and computer systems. Thus, 
the discussion which will be conducted at a fairly technical 
level should be of interest to engineers and scientists con- 
cerned with the problems of interfacing a variety of disci- 
plines, as well as to those interested in learning about the 
nature of current embryonic "robot" systems. 
NOTE: Tickets priced at $5.00 each (including lunch) for 
the all-day tour of "live robot" installations on Wednesday, 
Dec. 11th, will be available at this session. 



fair joint 

coniputer 

conference 



DECEMBER 9-10-11 

san francisco 
civic center 



Panel Members 

MR. L. CHAITIN 

Artificial Intelligence Group 

Stanford Research Institute 

ROBOT STUDIES AT STANFORD RESEARCH 

INSTITUTE 

PROF. J. A. FELDMAN 

Computer Science Department 

Stanford University 

THE ROBOT PROJECT 

AT STANFORD UNIVERSITY 

DR. T SHERIDAN 

Dept. of Mechanical Engineering 
MIT 

HUMAN CONTROL OF REMOTE COMPUTER 
MANIPULATORS 

MR. R. J. LEE 

Air Force Avionics Lab. 
Wright-Patterson AFB 
GENERAL PURPOSE MAN-LIKE ROBOTS 

PROR S. PAPERT 

Artificial Intelligence Project 

MIT, Project MAC 

THE MIT HAND-EYE PROJECT 

MR. L. SUTRO 

Dept. Aeronautics and Astronautics 
MIT 

ROBOT DEVELOPMENT AT THE 

MIT INSTRUMENTATION LABORATORY 
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RELATED BESEARiCH 



greviouB work by the author 

CONVESX 

A programming language is described which is applicable to 
problems conveniently described by transformation rules. By 
this is meant that patteim may be prescribed, each bemg 
associated with a skeleton, so that a series of such pairs moy 
be, searched until a pattern is found which matches cm expres- 
sion to be transformed. The conditions for a match are governed 
by a code which abo allovirs subexpressions to be identified and 
eventually substituted into the corresponding skeleton. The 
primitive patterns and primitive skeletons are described, as 
well as the principles which oltew their elaboration mto more 
complicated patterns and skeletons. The advantages of the 
language are that it allows one to apply transformation rules 
to lists ond arrays as easily as strings, that both patterns and 
skeletons may be defined recursively, and that as a consequence 
programs, may be stated quite concisely. 

Abstract of Convert paper in Com. A. CM. 

Because it is easy to write and modify a program in Convert, 
the language has been extensely used to quickly test 'good' 
and "great" Ideas, new algorithms, etc. It is embedded in 
the LISP of the PDP-6 computer (A.l. Group), in the IBM-7094 
(Project MAC-MIT); in the CDC-3600 (Uppsala University. Sweden), 
in the SDS-940 (Univ. of California, Berkeley). A paper in the 
A. CM. and {MAC M 305 > describe the language; examples of 
single programs written In Convert are In {MAC M 346); a book 
article {Patterns and Skeletons in Convert} la oriented 
toward the Lisp consumers. For our Spanish readers, two 
Bachelor's Theses {GummiI 1965) {Segovia 1967} describe the 
l,anguage and processors, And give examples. 

SCENE ANALYSIS 

(1) Polybrick ^MAC M 308} {Hawaii 69} is a Convert prct$««m that 
works on a scene or picture, expressed as a line drawing, and finds 
parallelepipeds in it. 
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(2) We would like to be able to specify In sone suitable notation 
models of the classes of objects we are Interested In (such as 'cube', 
'triangular prism', 'chair'), and make a program look for all Instan- 
ces of any given model In a given scene or flfui». Two arguments 
would have to be supplied to our program: the model of the object 

we are Interested in, and the scene that we want to analyze. 
Programs to do this are described In {AFCJRL-^eT-OlSS} and {HAC M 342>. 
In these early programs, partially occluded objects get Incorrectly 
Identified. These programs are also written In Convert, and work 
by transforming or compiling the model, written In a picture descrip- 
tion language. Into a Convert pattern, which searches the scene for 
Instances of the model. 

(3) A Master's Thesis {MAC TR 37} discusses many ways to Identify 
objects of known forms. Different kinds of models atul their proper- 
ties are analyzed. 

(4) It Is Important to be able to find the bodies that form a scene, 
without knowing titelr exact description or model. SES is a program 
that works on a scene presumably coo^osed of three-dimensional 
rectilinear objects, and analyzes the scene into a cog^osltlon of 
three-dimensional objects. Partially occluded objects are usually 
properly handled. This program was discussed in {MAC M 357}, 
{Guzman FJCC 68} and {Pisa 68}, and this thesis discusses a later 
version. 

(5) The present thesis goes beyond these topics to discuss also 
handling of stereo information (two views, left idnd rlgjht, of the 
same scene), improvements to deal with noisy (Impierfect) input, 
figure-background discrimination, and a few other subjects. 

Canaday 



Rudd H. Canaday in 1962 analysed scenes mmu- 
posed of two<luneii8ioiial "''«'' wpp'ng objeeta, "straii^t- 
mded pieces d eanlboard." His programlweakstheintace 
into its Mw^Moeat parts (tiw ineoes ot oaidbooid), de- 
seribes eadi <nie, gives tlie d^>th of eadi part in the 
imacB (or seeea), and rtatss wUi^ parts eant nhaA, 
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Roberts 

The problem of machine recognition of pictorial data has long been a 
challenging goal, but has seldom been attempted with anything more com- 
plex than alphabetic characters. Many people have felt that research on 
chartwter recognition would be a first step, leading the way to a more gen- 
eral pattern recognition system. However, the multitudinous attempts at 
character recognition, including my own, have not led very far. The reason, 
I feel, is that the study of abstract, two-dimensional forms leads us away 
from, not toward, the techniques necessary for the recognition of three- 
dimensional objects. The perception of solid objects is a process which can 
be based on the properties of three-dimensional transformations and the 
laws of nature. By carefully utilizing these properties, a procedure has been 
developed which not only identifies objects, but also determines their orien- 
tation and position in space. 

Three main processes have been developed and programed in this report. 
The input process produces a line drawing from a photograph. Then the 
three-dimensional construction program produces a three-dimensional ob- 
ject list from the line drawing. When this is completed, the three-dimen- 
sional display program can produce a two-dimensional projection of the 
objects from any point of view. Of these processes, the input program is the 
most restrictive, whereas the two-dimensional to three-dimensional and 
three-dimensional to two-dimensional programs are capable of handling 
almost any array of planar-surfaced objects. {froM Roberts ^ 

Roberts In 1903 described programs that (1) con- 
vert a picture (a scene) into a line drawing and (2) pro- 
duce a three-dimensional description of the objects 
shown intlie drawing in terms of models and their 
transformations. The main restriction on the lines is 
that they should be a perspective projection of the sur- 
face boundaries of a set of three-dimensional objects 
with planar surfaces. He relies on perspective and 
numerical computations, while SEE uses a heuristic and 
symbolic (i.e., non-numerical ) approach. Also, SEE 
does not need modeb to isolate bodies. Roberts' work is 
probably the most important and closest to oure. 



Mechanical Manipulator Groups (see also page 32 )• 



Actually, several researoh groups (at Massachusetts 
Institute of Technology, " at Stanford University, " 
at Stanford Research Institute ") work actively to- 
wards the realisation of a mechanical manipulator, i.e., 
an intellig«)t automata who could visually perceive and 
successfully interact with its enviomment, under the 
control of a computer. Naturally, the mechanisation of 
visual perception forms part of their research, and im- 
portant work begins to emerge from them in this area. 



35 



THE CONCEPT OF A BODY 

In this section definitions of a body or object will be proposed. 

The criterion is that they agree ip. general with the comnon use of 
the word 'body', while at the sane tine they should lead themselves 
to implementation Into a conqtuter program. 

Introduction 

Our ultimate Interest is to examine a two-dimensional scene (a 

picture, line drawing, or painting), presumably a representation 

(projection, photograph) of a three-dimensional scene (a subset of 

the "universe" or "real world") and to find in it objects or bodies 

contained in the real scene. More specifically, the aim la to find 

the two-dimensional representations (projections, photographs) of 

the different three-dimensional bodies pzesrait in the scene. 

The phrase "two-dimensional representation of a three- 
dimensional body" will be shortened to "two-dimensional 
body" or even to "body", ^ea no confusion arises. 

That is, we have to analyse a two-dimensional scene into collections 
of two-dimensional entitles (surfaces, regions, lines) , each of i4ilch 
makes "three-dimensional sense" as a two-dimensional projection 
of a three-dimensional body. 

The problem is ^nharently ambiguous 

A scene can be considered as a set of surfaces (faces or regions) , 
a body belonging to that scene is then an "approplate" subset of Shis 
collection. Therefore, the problem of finding bodies in a scan* is 
equivalent to the problem of partitioning the set into approplate 
subsets, each one of them representing or fomiag a b^dy (scane "CBD8CH"). 

The problem is Inherently ambigusus, since different collections 
of three-dimensional bodies can produce the same 2-dlm isoeaa, therefore 
a given scene can be partitioned in many ways into bodies. 
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It Is desired to make a 
"natural" partition or decompo- 
sition of the scene, natural In 
the sense that will agree with 
human opinion. >l< 

To define a three- 
dimensional body is no problem 
[a philosopher may disagree, 
perhaps in singular cases]: 

Three-dimensional body (definition) ; 




Figure •CBURCH' 

Set of eight elements. Adequate 
subsets (bodies) are [2 4] , 
[13 5 6 7 8]. In a nore com- 
plicated exMiple, people may 
differ in their parsing of scenes. 



A connected volune limited by a 
continuous, two-sided surface composed of 
portions of planes. 
Restriction: The above definition covers only poljAedral bodies, 

that is, those having flat faces. 
Restriction: No holes. 

No-restriction: Bodies do not need to be convex. 
Roughly speaking, a three-dimensional body Is something that does not 
fall apart into pieces when lifted [this nay be used as an operational 
definition of a body, given a mechanical manipulator to B«ke the neces- 
sary tests]. 

Given a tkree-dlaensional body, we generate a two-dlmenslooal body 
by taking a picture of it, as follows. 

Tan-i fijB f»t^±iana.l body (definition). Figure fomsd by the projection of 

a thtee-diaenslonal body. Genesalty* th« pfojte- 
tioBB is isometric or par^peotive. 
ThoB, this Is a view In two dlaenloiui o{ « ««tl4 body, from some 
partlcula r point of view. 

1Infortiiii«tel7> » two>diaen»lonal body could tmm ia thi* way ftom 
any o| saveml different S-dla bodies or, tA«t is worM» two 3-dla bodies 
t.?gether can give else to a slngl» 2-dlm body. For lx»t«ne«. In fig. "BENI", 
* WitlM)ut Wck « rieqmrem^ tixe problem has a trivial solution 
(see Metatheorem in page 39)- 
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Figure 'B RUT" 
Two blocks, or a bent brick. 

this tirorillii^iisionkL, bo^y. could ^te or. by 

two bieefc* nd^oewe^^ ««cAt^ ecfaet. We arc daallutg with one three- 
di«»nsiQBai 'bii^'lii the flrit case. Ml^l9mm^mf9mS^kmMm'f*»^ 
2-dim «atttr^l^ammVt, t** aniifttjr of Ugan 'BBBC') Is the SSM. and 
we are e oiif tlWiiM wiOi an tflft^^xtfeiitt: jUA^^^ty^ 

lAlch could be the tiilli/nut^iMii^ M' 3m''i^l<mr^M bodies, or the 
picture of a sculpture (one bo^) in Belslsldrv- 




71gux« *8SHUSR* 
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Such colorful contradictions point towards the nead to lay down 
a more careful definition of our task. For Instance, no one would think 
that figure 'COBE' 



Fig. 'CUBE' 

No one would think. . , 



contains three bodies. Nevertheless (see fig. 'PARALLELEPIPED' in 
next page), that could be the case. 

These two extremes ar« to be avoided by an appropiate definition 
of a body and the corxespcmdiiag ccMiputer program. 




Le 



SL 



scene 



That 2-dim scene la %*iich each line is boundary of some 



region. 



A 




^ 



Legal scene. Illegal. Illegal. 
See also comments to scene R3, and 'IlUgi^ Scenes' (page 2(7), in 
section 'On noisy input'. 

Metathaorwa it^^^y j^^^j ^^^^^ ^^^ always be the projection of one or 

more tbree-dimensional objects." 

To prove it, it suffices to note that each legal scene is composed 
of regions \f:»:ii'ip ^ , and each of them could be interpreted as the 
basis of a pyramid, all the faces neeting at the cuspid occluded by 
the basis. £|^^^^ 

Therefore, each legal sedse can be obtalJi«^ by projecting or 
photographing att adequate arrangement of sueh fyx^ds. 



We can always construct a 
legal scene by photographing 
(or projecting) suitable 
3-dlm polyhedra. 
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Figure 'PARALLELEPIPED' 
An Improbable decomposition of a scene. 
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IrlYlal partition g^^ug^ „£ the metatheorem, we can always find a 
decomposition of a visual scene Into three-dimensional bodies; we 
call this answer "trivial". Hunans do not split scenes this way. 
Our program should not, either. 

But the metatheorem points out that "Impossible scenes" are ne- 
ver found among the legal scenes (see section 'On Optical Illusions')} 
these always have at least one lnteriiretatlon.[»J •f ■''^"'"*' f***""! 

We are trying to give criteria for proposing bodies that will 
suit our ends, which are to define a "reasonable" or "standard" body. 
This will permit us to Judge the performance of a program designed 
to find objects In a scene. 

Several criteria are possible: 

1. Roberts {1963} suggests: given several models of three-dimensional 

bodies, use some numerical techniques, such as least squares 
fitting, to find \Alch model fits best through a suitable 
transformation, and accept this match If the error Is tolera- 
bly small. Complicated conqtosltlons of elementary bodies 
are considered. 

2. Ledley {1962} would propose: In terms of sult«ble primitive components 

(arcs, legs, etc.), make a syntactical analysis of the scene, 
with the help of a grammar. In such « way that the models of 
the object you want to Identify are formed recursively from 
these primitive compcments and (perhaps) other bodies. 
Naraslidian {1962} and Klrsch -{1964} would agree on this 
llngulstlcal approach. A. C. Shaw {Ph. D. Thesis} assents. 

3. Guzman {1967} suggests: prepare models which specify a fixed 

topology but where other relations (length of sides, paralle- 
lism of two lines, equality of angles) are specified through 
the use of open variables (UAR variables. In COMVBKT). 
Evans {1968} would agree with that. 

These approaches require the existence of a model which describes the 
object to be Identified; the model specifies a particular 3-dlm object 
(or a class of them). These approaches are answering more than vrtiat 
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was asked; they tell not only "yes. It Is a body", but also 
"It Is « pyramid". The eurrent quastlon Is sore general. 
It is desired to know If sonethlng Is a body, any body, 
even one lAlch has not been seen before. 

If It were possible to Implement a program to answer that question, 
then that would be a working definition of a body. SEE Is a program 
which cones close to this goal, so that it could be pragmatically stated: 

2"dlm body "a la SEE" (definition). A body Is each set of regions 

recognised by the progssa SBS as such. 

This definition allows the fallowing 
Criticism: A perfect way to hunt lions is to 
capture any entity E, and to call 
that a lion, by definition. 
That is, although this definition is precise, SEE may make 
decisions "contrary to eommon sense"; also, for purposes of Judging 
the behavior of the program, this definition is useless, since SEE 
will be perfect 100 per cent of the time. Irrespective of its answers. 

We are» final ly» tempted t« codelude that *criiwmiii: saaae'', or 
better, "haHm eoanon sense" plays a role ifi tlM ^^^Lnition of a body, 
since ^ttmt we axe trying to eharactairise is a < yu«aL >tadv . normal body . 
co—on body , else. But even people may dlf fiax in th^x- parsings of 
scenes. We cooldr of course, give s.semMc (such as 'KiOMO^ in page77) 
to 100 [subjects , ask them ta identify the different bodies is it, and 
come up with eome sort of 'average* or 'general consensus' : 

2-dim body (statistical and human-behavioral definition). Each one of 

the subsets into which a scene is partltlbned by many subjects. 
It is uaderstood that, in this spirit, the human objects should be 
motivated to satisfy a 

Slapllcitv criterion : Of the several "reasonable" interpretations 

(dteompositlons) of a scene, the one lAlch 
contains the smaller nuiriwr of bodies is 
preferable. 
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That Is, an explanation or decompoaltlon la simpler (and preferable) 
If It can be done with fewer parts. 

Simplicity Is not to be achieved at any cost, since the parsing 
of the scene has to produce 'plausible' bodies, since "simplicity" 
could be always achieved If each scene Is reported as a single, 
gigantic body, obtained perhaps from more familiar ones through liberal 
use of adheslves (ef. also Sibelius' Monument). 

The chief choices are surely: 
*" To choose a parsing, or 
>°* To list many (perhaps rank-ordered) In case of ambiguity. 

If we select the first alternative, further choices are 

*•■ to have a natural parsing (human) . 

>■■■ to have a canonical parsing. In the sense of minimising 
some variable (the minimisation of the number of bodies 
leads us to Sibelius' Monument, Its maximisation to the 
Trivial Solution of the aetatheoren Cpage 411). 

Othe r kinds of 2-dlm data 

"■^■"-■^^■~— *— *— ii-i— — We have been discussing Identification of 

3-<dlm bodies (through their 2'dim projections) in a 2-dlm scene, 
purely on the basis of geonetric regions « Many other kinds of infor- 
mation could be used, such as texture, color, and shadows. 

Nevertheless, it is interesting 
to see how far the identification 
of bodies can go if only geometric 
properties are used. 

■mMMHi^^ Finding bodies in a 2-diffl scene Is a task no^ very £rec^8el^ 
de^lge^, because of the ambiguities liUierMit in any projection process. 
On these grounds, the concept of 'body' is best described throng 
familiarity, husMn opinion and consensus. We are forced to this because 
any scene could be partitioned in several ways (ef. flg< 'P^RALIfLEPIPED* ) 
only some of idileh may be considered plausible or 'sensible' (natural, 
conmion, standard) partltltHis in regard to tlw bodies forming it. 
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TOTAL ANALYSIS OF VERTICES 

° Here a scene le considered as formed by several regions; 
bodies are adequate collections of regions. The problem of Identifying 
bodies is restated as the problem of finding whether two regions 
belong or do not belong to the same body. This question Is answered 
by examining the vertices of the scene. 

It is shown that a single vertex never conveys conclusive 
evidence, so that at least a pair of vertices is required to isolate a 
body} familiar and unfamiliar configurations of objects help to under- 
stand how the vertices are to be used in this task. 

Vertices are the ingortant feature 



All faces of polj^edra are bounded 
by edges. 

All edges terminate in vertices. 
== This thesis deals with the analysis of visual scenes coiiq>osed 
mainly by three-dimensional planar objects 

These are limited by flat surfaces 





=" All these bodies share »m a common feature the edge ; place where 
two planes [faces] meet (but see page 57 ). 




»= Wherever several edges or faces meet, a verte^ Moears . This is 
also a conaon feature for all the bodies. 





A body is formed by vertices with edges connecting some of these. 
When a 3-dlm body is projected Into a 2-dlm body, its 3-dlm vertices 
(which we will call genuine 3HHm vertices) are trangfotaied into 
genuine 2-dlm vertices, known as images of the 3-dlm vertices, as 
figure 'GENDINE' (in next page) Indicates , 

That is, a genuine 2-dim vertex has com from a genuine 3-dim 
vertex. Some 2-dim "false" vertices appear too; they do not come 
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Two 3-dlm 
bodies, one 
of them 
showing 
Its genuine 
3-dlm 
vertices . 




A 2-dlm 
scene 
contai- 
ning two 
2-dlm bodies, 
one of them 
showing Its 
genuine 2-dlm 
vertices. 

Three false 
vertices also 
appear. 



Figure 'GENUINE' 

A genuine vertex (such as G-,') is one whose counterlmage 
(6^ in this case) belongs to some body; a false vertex 
•uch as F2', is a virtual intersection, and generally 
has no counterlmage in the 3-dim world. See fig. 'NOIXS' . 
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from genuine 3-dim vertices, but rather from the partial occlusion 
of parts of opaque bodies [transparent objects give rise to different 
kind of false Tertlces; Guzman (MS Thesis} deals with them by using 
transparent models, and a mode of operation of TD, the recognizer, 
that re-interprets or Ignores certain types of vertices. {AFCRL-67-0133)] , 
False vertices do not belong to any object. 

Genuine and false vertices ™ , ,,, 

^— — — — — — ^^— ^ The classification of vertices into 

categories "genuine" and "fal8e"will allow isolation of objects in a 

picture; In fig, '6EN0IKE', elimination of vertices F, ' , 7j, and F ' 

12 3 

divides the genuine nodes of the network (see fig. 'NODES') into two 

non-connected con^onents, A and Q , correctly separating the two bodLas. 




Figure 'N D E S' 
False vertices arise from the Intersection of two 
projeiited edges, one of which Is typically occluded 
in part by a face bordered by the other. Elimination 
of t»« «»liie :«edMf|r*F2' WifsV^lpc^^ 
the network in two separate co^onents, irftlch are 
the bodies sought for. 

This sugfests the following 
2-dlm body (first approx. to definition). Set of regions possessing 

only genuine vertices, and separated from other bodies 

by false vertices. 
In this way, the problem of identifying bodies is equivalent to the 
problem of identifying genuine vertices, segregating the false ones. 
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Problems to be Bolved ^^ computation of this equivalence Is challenged 

by several problems: 

=-> The distribution and position of bodies may be such that false 
vertices look like genuine vertices (fig. 'CAOTION'). 




Fig. 'CAUTION* 
That vertex looks genuine, but is false. 

Global Information (analysis of more than one vertex) Is needed 
in general to distinguish them. In other words, although false 
vertices are those which separate two bodies, and 2-dim genuine 
vertices originate from 3-dim genuine vertices, to segregate 
them requires more than the simple analysis of their shape. 

Some genuine vertices look like false vertices 



-^^^^i 



Genuine vertices of a body may not be present in the scene, or 
may be supplanted by false vertices. 




A single body may have totally disconnected sections (portions) , 




Continuation is not clear} some doubts arise if the object in 
the foreground covers one or two bodies (fig. 'CONTINUATION')} 
the simplicity criterion prefers the single body interpretation. 
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Fig. 'COMXIKUATION' 
Continuation Is not clear. 



In brief , difficulties are of two kinds t 

» Genuine and false vertices can not be distinguished 
locally (see Theorem below). 

= Even ^en they are eonqtletely classified, problem of 
fig. 'CONTINUATION' remains. 

The solution of these problems will have to make use of more global 
infomatlon. 



CUssiflcation of Vertices ^^ ^^^^^ •VERTICES' in next page classi- 
fies vertices according to tbelr form, number of lines and angles 
among the lines. It contains the most common types; vertices having 
more edges could have been imluded. 

Let us consider one of these types, ABROW. Three regions called 
1, 2, and 3, form it. The standard, most coamon 
ARB0W configuration is a body with faces 1 and 2 
seen against some other object 3. We indicate 
this by [ (1 2) (3) ]. However all other configurations are possible: 






[ (1) (2) (3) ] 



1(1 3) (2) 




1 2 3)] 
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'L'.- Vertex where two 
lines meet. 



'FORK'.- Three lines forming angles 
smaller than 180 degrees. 




'ARROW' .- Three lines meeting at 
a point, with one of 



'T'.- Three concurrent lines, two 
of them collinear. 



the angles bigger than 
180 degrees. 





' K' . - Two of the lines are 

collinear, iind the other 
two fall on the same side 
of such lines. 



'X'.- TWO of the lines are collinear, 
and the other two fall on 
opposite sides of such lines. 





'PEAK'.- Formed by four or more 
lines, when there is an 
angle bigger than 180°. 



'MULTI'.- 



Vertices formed by four or 
more lines, and not falling 
in any of the preceding types 



TABLE 'VERTICES' 
Classification of rectilinear vertices. 
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Thus, for an ARROW, all the groupings of Its faces are possible; any 
procedure that, by looking at an Arrow tries to decide how its faces 
are groiq>ed into bodies, will always make mistakes. 

The generalization of the above analysis to all other types of 
vertices proves the following 

"Theorem". There does not exist a set of local decision procedures 

[[x^] , each one looking or getting information from one vertex 
and establishing b-equivalences among some of their faces 
(two faces a and b are b-equlvalent , indicated asb, if 
the ji^ decides that they belong to the same body; this is 
an equivalence relation) , twing Inf omation only from that 
vertex (it does not look at the other vertices or at the values 
of the p's at the other vertices), which will partition all 
scenes correctly. 

That is, the following machine will not work for all scenes: 



The 




SE^Ts 



decide by processing information at exactly one v«rt«; 
the box in the right accepts all these decisions and passes them as 
results. Ho matter what set of Hj^ we choose, there exists a scene 
that Induces an incorrect partition by our nachlne. 
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A stronger assertion is that. In view of Inherent ambiguity, 
there Is not even any global procedure I ^ ^ 

All the different groupings of regions of a vertex Into bodies 
are possible j this Is Illustrated by the following complete set of 
scenes, each one of them showing a different partitioning of a type 
of vertex. These examples are useful also In giving an Idea of 
unusual, as well as familiar scenes} we will have later occasion to 
use them, when searching for heuristics to form bodies. 

Generation of partitions 



conpo ( ( 1 
((1) (2)) 
((1 2)) 
2 



2) ) 



There are only two partitions of a 
set of two elements. 



2 



/ 



Partitions of a set of 
elements 



compo ( (12 3) 
I ((1) (2) (3)) 
z ((1 2) (3)) 

3 ((1 3) (2)) 

4 ((1) (2 3)) 

5 ((1 2 3)) 
5 



Partitions of a set of LmLf 
elements *«^ 

compo ( (1 2 3 i») 

I ((1) (2) (3) (!i)) 
z {(1 2) (3) (ti)) 

3 (11 3) (2) (!»)> 

4 ((1 l») (2) (5)) 

5 ((1) (2 3) («»)) 

6 ((1 2 3) (i»y> 

7 ((1 k) (2 3)) 

» ((1) (2 «i) (3)) 

*? ((1 2 k) (3)) 

•o ((1 3) (2 1»)) 

II ((1) (2) (3 !»)) 
iz ((1 2) (3 U)) 

13 ((1 3 k) (2)) 
1^ ((1) (2 3 If)) 
•5 ((1 2 3 !»)) 



Figures in the next ^w page* aie 
numbered according to the numbers 
in the leftmost column in these 
tables. 
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CORNER 












52 



FORK 



2 
3 






e 






3 
4 





53 





11 






14 






12 




9 
13 




10 




15 




54 




2 

4 
7 






H 




6 
13 






15 



10 



12 



14 







55 



2 
4 
5 
11 






3 
8 





10 



7 
12 



15 





15 




PEAK 



MULTI 




not 
represen 

t«d 
here 




56 



Digression I. An alternate approach 



Suggestion 



As an alternate approach, one could try to use the faces as a 
basis for identification. For instance, use two scenes (left image, 
right image) or pictures, localize a sharp feature in one of them 
(vertex, crack in the face, peculiar texture, etc.) and by correlation 
or some other method, find it also in the other picture. Having 
found a few points in both images in this manner, determine the plane 
of the face, in 3-dim space. When several faces are thus identified, 
we can compute, if desired, their intersection and obtain the edges 
(lines). It will generally suffice to ignore the edges and rely on 
the faces. Since it is reasonable to expect considerable difficulty 
in finding lines and in differentiating lines caused by edges from 
those caused by shadows, an approach which avoids the lines altogether 
looks promising. But in this case, in addition to requiring two 
images, several correlations are needed (if we choose this method), 
a generally time-consuming and error-prone task. 
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(-ST— - 



SEE, A PROGRAM THAT FINDS BODIES IN A SCENE 



Synopsis 



How SEE works. 



Algorlthna and heuristics are presented, la^lemented In a 
program, that analyze a scene Into a composition of three-dimensional 
objects. Only the two-dimensional representation of the three- 
dimensional scene Is available as Input, and Is described by a 
collection of surfaces, lines and vertices. 

SEE loc^s for three-dimensional objects In two-dimensional scenes. 
The program does not require a pre-concelved Idea of the form of the 
objects which could appear In the scenes. It Is only assumed that 
they will be solid objects formed by plane surfaces. Thus, SEE can 
not find "pentagonal prisms" or "houses" In a scene, since It does 
not know what a "pentagonal prism" Is; but It will usually Isolate 
the pentagonal prisms (or any other regular or Irregular solid) In a 
■cens, even If some of them are partially occluded, without having 
a description of such objects. It does this by paying attention 
to configuration of surfaces and lines which would make plausible 
three-dimensional solids, and In this way 'bodies' are Identified. 

The analysis that SEE makes of the different scenes generally 
agrees with human opinion, although In some ambiguous cases they 
tend to be conservative. The most interesting thing about the 
program is how well it deals with occlusions. Many examples in 
the next section 'Analysis of many scenes' Illustrate the features 
and peculiarities of the program, ^nd also illustrate the effects 
of inaccuracies Introduced in the data. 
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LIO 




A seciiA analysed by SEE. 
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INTRODUCTION 

Here is a program that locates objects in an optical image of a 
scene most likely composed by three-dimensional solids, perhaps 
occluding one to another, so that some of them may not be totally 
visible. We use a line drawing as our representation of the scene. 

The analysis of scene LIO (see figure 'LlC in next page) by 

our program, named SEE, produces 

(800Y 1. IS 15 <1 <4 tl2) 
(BODY 2. IS t6 tl5 t7 in ti4) 
(BODY 3. IS X6 <9 UO *3> 

(Body *, is *z U3) 



Division of work in computer vision 

In trying to construct a program for seeing, several approaches 
are possible; most of them require some of the following set of 
modular programs or subroutines. 

Pre-processing. Converts the image from a 2-dim array of intensities 
to a symbolic representation or ;'.intemal format' (page 66 ), in 
terms of vertices and lines connecting them. 

Homogeneity predicates . They decide if areas of the picture are 

inhomogeneous , and hence require further analysis (page Ifo). 

Color predicates . Boundaries of different color suggest lines. 

Line finder . Locates lines of points having certain property 

(such as being inhomogeneous, or having a large light Intensity 

gradient) . 

Vertex finder . Concurrent lines are merged, or a vertex is created 

at their meeting point. 

Consolldator . Eliminates the false lines and finds more lines, 
incrementing in this way as much as possible the reliability of the 
system. 
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Illumination program . Discovers where the main light sources are. 

Shadows program . Detects shadows so as to eliminate them. 

Missing lines program . General shape considerations suggest places 

where faint lines can remain undetected. 
Body recognition . Partitions the scene Into approplate subsets, each 
one being a body or object. Thus, SEE Is a body-recognition program. 

Object identification . These objects are compared against abstract 
descriptions (modela) of cubes, pyramids, etc., so tbs t a classification 
is done, and a name is attached to each one. In the process, certain 
parameters may acquire values: the height of the pyramid is observed. 
Positioning . Having analyzed the scene, the relevant objects are 
positioned in three-dimensional space, and additional relations among 
them are discovered (support, obstruction, etc.). Enough information 
is obtained to allow the mechanical arm to manipulate the objects and 

achieve its goals. 

Stereo . More than one view are analyzed (pageZSa) and from them, 

3-dim spatial positions are found. 

Focussing . The computer, by adjusting the focus of Its lens, 

acquires knowledge of how far the objects are. 
Feedback among these parts is more necessary as the complexity of the 
scene and of the desired goals increases. 

Recognizer . The task of body recognition and body identification was 
formerly accomplished by a single program (for instance, DT or TD {my 
MS Thesis}) that compares the symbolic description of the scene against 
the symbolic or abstract description of the model of the desired object, 
in a kind of two-dimensional matching, to isolate instances of that 
object In the scene. 

Technical descriptions of SEE 

1. Annotated listings . Above all, the primary source of information 
is the listing of the programs, that appears complete in this thesis. 
They are written in Lisp. If, despite my efforts, some of my explanations 
are not clear, consult it: it is annotated. The programs themselves. 
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examples, test data, results. Instructions, etc., are In the DEC- 

magnetlc tape "GOZMAN F" at Project MAC (AI groupi. Instructions 

are given in page 78. 

2. This section of the thesis contains a description and discussion 

of the different algorithms and procedures used. 

3> Publishe d papers that cover part of the material at somewhat 

less depth, and therefore are more readable, are also available 

{FJCC 68} {Pisa 68}. Except that they contain some exan^les not 

Included here, they contain no other Information not covered here. 

4. An internal report {MAC M 357} described an earlier version of SEE. 




FIGURE 'R 3' 
A scene. 
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INPUT FORMAT 

Kventually, several preprocessors will be able to receive data 
through an input camera and reduce It to the "internal format" of a 
scene, in the form required by SEE. For testing purposes, the scenes 
are entered by hand in a simplified format, called 'Input format', 
to be described now. All the scenes analyzed by SEE have been written 
in input format . 

Example. R3 . The input format of scene R3 is 

IDEFPROP R3 (X(7) BACKGROUND) 

(NOT iSfLlQ R3 (dUOTE ( 

XA 4.3 4.5 (Xt7 XC Xt4 XC XU XB ) 

XB 4.0 5.7 (Xt7 XA Xxl XO) 

XC 4. a d.S (XS4 XF X*2 XO X*l XA) 

XU 4.5 9.15 (X>7 XB XU XC X*2 XE ) 

XC 5.65 9.25 (X«7 XD X*2 XF ) 

XF 5. as a. 6 (X*7 XC XI2 XC X*4 XG) 

XG 6.6 5.2 (X«7 XF X«4 XA ) r3 IN INPUT FORMAT 

XH 6.9 15.4 (X<7 XL X>3 XK XI5 XI) 

XI 8.5 16.0 (X*7 XH X*5 X J ) 

XJ n.a 12.6 i%t7 XI Xt5 XK XI6 XN) 

Xi< 10,0 11.9 (X*6 xJ X»5 XH X«3 XM) 

XU 7.1 13.2 {%i7 Xti X*3 XH ) 

XM 10.0 9.7 (X»7 XN X»6 XK X«3 XL) 

XN 11.65 10.3 (X*7 xJ X*6 Xl ) 

) > ) ) 

The first line declares : 7 to be the backgroiind.* We have to 
tell SEE which regions belong to the background. If this informatlor 
is missing, a program Is called that will compute the regions that 
belong to the background (see section 'Background discrimination by 
computer') prior to other calculations. 

After that, the lines associate with each vertex its 2-dim coordi- 
nates and a list (which will later be called 'KIND'), in counterclock- 
wise order, of regions and vertices radiating from that vertex. 

The function PREPARA (see listing) converts the scene as just given 
to the "internal format" form which SEE expects. It does this by putting 
many properties in the property lists of the atoms representing vertices 
and regions (property lists in Lisp get explained in next page). 



*For the moment, ignore the % signs. They are used to distinguish 
right from left scenes. 
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Property lists In Lisp * „ . 

— — — — ^—1 ^^ Each atomic e^rff^lon In Lisp has a 

property list, which Is a place where facts can be stored. 

If It is desired to represent the fact that John is a 69 years 

old male, has a wife called Jacqueline, and a height of value 1.77 m, 

we could proceed In Lisp as follows: 

(1) We will agree that the atom 'JOHN' will represent our man. 

(2) In the property list of 'JOHN' we will store several properties 
or Indicators and their values, using the function POTPROP, that 
stores information in the property list; thus 

(Putprop (quote John) (quote Jacqueline) (quote Wife)) 
will add, under the indicator or property 'Wife', the value 

'Jacqueline' : 

JOHN 

I 

WIFE JACQUELINE 

(3) Hence, the representation of our facts in Lisp Is 

JOHN 



SEX — MALE 
I 

AGE — 69.0 
I 
WIFE — JACQDELIME 

HEIGHT — (1,77 m) 

(4) In fact, the property list of ' JOITO' , **ich is the CDR of 'JOffll' 
in Lisp 1.6 {MAC M 313}, is 

(S?I MALE AGE 69.0 WIFE JACQBiLINE HEIGHT (1.77 m) ...) 

(5) If later we want to know the age of John, we will ask 

(Get (quote John) (quote Age)) 
and the value will be 69.0 



This paragraph, vAlph can b? skipped if it is known what a 
property list is, will make the next section clearer. 
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INTERNAL FORMAT 



The program assumes the scene In a special symbolic format, 
which basically, is an arrangement of relations between vertices and 
regions, which are represented by atoms having adequate properties 
in their property-lists. 

A scene has a name which identifies it; this name is an atom 
whose property list contains the properties 'REGIONS', 'VERTICES', 
and 'BACKGROUND'. For example, the scene R3 (see figure R3) has the 
name 'RS'. In the property list of R3 we find (see also table"K3 '*/ 
INTERNAL FORMAt") 

KEGIONS (XI6 X'.5 XS3 Xt2 XH XtA XX7) 

Unordered list of regions ^ 

composing the scene R3. Ortkfii imiMafe™- 

VERTICES (XN XM XL XK XJ XI XH XU XF XE XD XC Xb /.A) 

Unordered list of vertices 
composing the scene R3. 

SACKPROUND (Xj7) 

Unordered list of regions 
composing the background of 
scene R3. 

Region 

A region corresponds to a surface limited by simple closed curves. 

Regions are represented by atoms that start with a colon (:). For instance, 

in R3, the surface delimited by the vertices K J N M is a region. 

called :6, but D E F G A C is not. 

Each region has as name an atom which possess additional proper- 
ties describing different attibutes of the region in question. These 
are 'NEIGHBORS', 'KVERTICES', and 'FOOP'. For example, the region in 
scene R3 formed by the lines DE, EF, FC, CD has ':2' as its name. 
In the property list of :2 we find: 

NEIGHBORS (X14 XJ? x:/ XXI) 

Counterclockwise ordered list of 
all regions which are neighbors to 
:2. For each region, this list is 
unique up to cyclic permutation. 
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KVEftTICES (XF XE XD XC) 

Counterclockwise ordered list of 
all vertices which belong to 
region :2. This list Is unique 
up to cyclic permutation. 

FUOP ((Xt4 XF %:/ XE XS7 XO Xtl XC » ) 

Each sublist is a counterclockwise 
ordered list of alternating 
neighbors and kvertlces of :2. 
Each sublist is unique up to cyclic 
permutation, and indicates a 
simple boundary. 

Each sublist of the FOOP property of a region is formed by a 
man who walks on its boundary always having this region to his left, 
and takes note of the regions to his right and of the vertices which 
he finds in his way. 

As other example, in the property list of :7 we find: 

NEIGHBORS {%tt XI6 X*3 Xt3 **i XtS X>2 X*2 XS4 X>4 
Xtl Xtl) 

KVErTICES (XN XM XL XH XI XJ XE XF X& XA XB XD) 

FOOP <«X»6 XN XS6 XM X»3 XL X»3 XM X»& Xl X«5 

XJ» (X»2 XE XI2 xF x»4 x« %** XA %* I xB X»i *^^> 



Ver tex 

.^m^m^ A vertex Is the point where two or more lines of the scene 

meet; for instance, A, G, and K are vertices of the scene R3. Each 

vertex has as name an atom which possess additional properties des'- 

cribing different attributes of the vertex in question. These are 

'XCOR', 'YCOR', 'NVERTICES', 'NREGIONS', 'KIND', 'TYPE', and 'NEXTE' , 

For example, vertex J (see scene R3) has in its property list: 



XCOR 11.799999 

YCOR 12.600000 

NVERTICES (XI XK XN) 



x-coordinate 



y-coordinate 



Counterclockwise ordered list of 
vertices to which J is connected. 
Unique up to cyclic permutation. 
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NREGIONS (Xt7 XtS Xt6) 

Counterclockwise ordered list of 
regions to which J is connected. 
Unique up to cyclic permutation. 

KINO (Xt7 XI Xi5 XK Xt6 XN) 

Countetdockvise ordered list of 
alternating nregions and nvertices 
of J. This list is unique up to 
cyclic permutation. 

TrPE (ARROW UK XJ XI XN XI5 XU Xt7)) 

List of two elements; the first is 
an atom indicating the type-name 
of J; the second is the datum of J. 
To be explained in next section. 

(NEXTE) Vertex J does not have the indica- 

tor NEXTE in its property list. 

The KIND property of a vertex is formed by a man who stands at 
the vertex and, while rotating counterclockwise, takes note of the 
regions and vertices which he sees. NREGIONS and NVERTICES are then 
easily derived from KIND, by taking its odd positioned elements, and 
its even positioned elements, respectively. 

NEXTE is a property that appears in certain vertices (non* in 
scene R3) ; it will be explained in next section. 

The property TYPE is also put by the function PREPARA; it classi- 
fies each vertex into one of several types, as described in table 

'VERTICES' {next page). 




'L'.- Vertex where two 
lines meet. 



' FORK' . - Three lines forming angles 
smaller than 180 degrees. 




'ARROW' .- Three lines meeting at 
a point, with one of 
the angles bigger than 
180 degrees. 



'T'.- Three concurrent lines, two 
of them collinear. 





'K'.- Two of the ll«es are 

collinear, and the other 
two fall on the same side 
of such lines. 



/^ 



' PEAK' . - Formed by four or more 
lines, when there is an 
angle bigger than 180°. 



Two of the lines are collinear, 
and the other two fall on 
opposite sides of such lines. 




'MULTI'.- Vertices formed by four or 
more lines, and not falling 
in any of the preceding types. 



TABLE 'VERTICES' 
Classification of rectilinear verticea. 
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TYPES OF VERTICES 

The disposition, slope and number of lines which form a vertex 
are used to classified it, task performed by the function 
(TYPEGENERATOR L.) by storing in its property list its corresponding 
type. 

The TYPE of a vertex is always a list of two elenrients; the first 
is the type -name : one of 'L', 'FORK', 'ARROW, 'T', 'K' , 'X', 'PEAK', 
'MULTI'; the second element is the datum, which generally is a list, 
whose forna varies with the type -name and contains information in a 
determined order about the vertex in question (see table 'VERTICES'). 

Vertices where two lines meet. 



Li. - A vertex formed by only two lines is always classified as of type 'L'. 

Two angles exist at it, one bigger and other smaller than 180°. The 

datum is a list of the form 

(Ej E2). where Ej is the region which contains 
the angle smaller than 180°. 

E^ is the region which contains 

the angle greater than 180°. ■*" ^ E„ 

For instance, in scene R3 (see fig. 'R3'). 
G has in its property list: 

TYPE (L (%:4 %:7) ) 

The vertices of type L present in R3 
are B, E, G, I, L, N. 



Vertices where three lines meet. 




FORK. - Three lines meeting at a point and forming angles smaller than 



180° form a FORK. 
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Its datum is the vertex itself 
at which the fork occurs. For instance, 
vertex Khas in its property list 

TYPE (FORK %K) 

The vertices of type FORK present 
in R3 are C, K, 



ARROW. - Three lines meeting at a point, with one of the angles bigger 
than 180°. 
The datum of an ARROW is a list like 
(Ej E2 Ej E^ Eg E^ E^) where 



E, is the vertex at the 'tail', 
E, is the vertex at the center. 
E, is the vertex at the left of E,- 




^ 



E . is the vertex at the right. 
Ec is the region at the left. 
E/ is the region at the right. 

E_ is the region which contains the angle bigger than 180 . 
For instance, vertex H has in its property list 

TYPE (ARROW (%K %H %L %I %:3 %:5 %:7)) 

The vertices of type ARROW present in R3 are A, D, H, J, M. 

T. - Three concurrent lines, of w^hich two are coUinear. 



-fig R3 



The datum for a T is a list of the form ( E. EL EL E. E- E, E^ ), where 



E^ is the vertex at the 'tail' of the T. 

E-^ is the central vertex. 

Eo is a vertex such that E^ E^ E, is 



an angle between 90 and 180 degrees. 

E^ is a vertex such that E, E^ E . is 
4 12 4 

an angle smaller than 90 degrees. 
That is, E^ Ey E . are coUinear. 
Ec is the region which contains the 
angle between 90 and 180 degrees. 
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E/ is the region which contains the angle smaller than 90 degrees. 

E_ is the "central "region (where the 180 angle is). 
For instance, vertex F (fig. R3) has in its property list 

TYPE (T (%C %F %G %E %:2 %:4 %:7) ) 

The vertices of type T present in R3 are F only. 
See also "Matching T' s or Nextes "below. 

Vertices where four lines meet. 




K.- When two of the lines are collinear, and the other two fall in the 
same side of such lines, The datum is a list of the form 
(E^ E2 E3 E4 E5 E^ E^ Eg) where 

E. is the central region. 

K-p is the region having the 180 angle. 

Eo is the collinear vertex which falls 
to the left of E. E,. 

E. is the region to the left of E.-*-E^ 

E|- is the vertex to the left of E^-^E, 

E/ is the collinear vertex which falls to the right. 

E^ is the region to the right of E^-*-E-,. 

Eg is the other vertex to the right (of E ). "3 

R3 contains no vertices of type K. PA of figure BRIDGE is of type 'K'. 
X. - When two of the lines are collinear, and the other two 
fall in opposite sides of such lines. The datum is a list of the form 
(Ej E2 E3 E^Eg E^). where 

E. is one of the collinear vertices. 

E, is the region to the left of E^ C, 
where C is the vertex at the center. 

E3 is the region to the right of E. C, 

E . is the other collinear vertex. 

Ec is the region to the left of E . C. 

E/ is the region to the right of E . C. 
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For instance, we find in the property list of F 
(figure BRIDGE) : 

TYPE (X (QA:26 : 22 G :21 :30) ) 

The vertices of type X present in BRIDGE are F, only. 

The datum for an X may also be in the form (E^ E^ E^ E^ E^ E^] 
Vertices of four lines which are not of type K or X are either of 
type PEAK or MULTI. 

Other types of vertices. 

PEAK. - Formed by four or more lines, when there is an angle bigger 
than 180°. 



PEAK 





MULTI 



MULTI. - Vertices formed by four or more lines, and not falling in any 
of the preceding types, belong to the type MULTI. R3 contains 
no PEAKS or MULTIS. 
The datum for vertices of type PEAK is of the form (E^ E^ E^), where 
E, is the region that contains the angle bigger than 180 degrees; 

El IS me vcrLCJs. ucxuic -^2' "^"^ ""3 



is the vertex before E,, and E^ is after (in the ^ sense). 

The datum for vertices of type MULTI is of the form E^, where 



E, is the vertex itself. 

NEXTEs or Matching T' s.Two T' s which are coUinear and facing each other 
(see figure) are called "matching T's, and each one is the "nexte " 
of the other. The indicator "NEXTE "is placed in such vertices. 
If the region E^ of a T (see figure) is the background, that 
T can not be a matching T. 
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In the figure, E, and F, are matching T's because E,-E„ is 
colinear with Fj"^!" ^^ ^^ ^°^ required of E--E, to be parallel, to 
^3~^4' ^^ several pairs of T's are possible, the closest is chosen: 



\ 



R 



F - Q are matching T's, 
and not P - R. 

The etching T's will get involved in the determination of places 
where a body is occluded by another object and later emerges visible 
again. 
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For two T's to be NEXTEs or matching T's, It is required that 
neither E- nor F- be backgro«md .^ rjc<^t«iient ^ould be extended to 
all regions between E_ and F , since a-lln» tSan not go "under" the 
background region; 



r 




f!^IS] 



A and B can tto« b* NEXTEs, since :11 is .^ 
Two stxaight lines nJLwiys inteESect Opfl«i^]K.%t 
infinity) ; a way to detect these hadilg^co^tA t§|i<ma ; 
is to write £u«ct^ont (subnwtines^ that f ind j^t if ? two ae^ifenta of 
line intersect, ot if one a^gBent int^EMects with a iiae* 





\ 



LINES AND SEGMENTS 
In the plane, two straight lines Always meet. 
Iteo segments, or a line and a segment, may or 
may not meet. (At^Mtut *< «. f>«it( f»tti»m tf aU'*')- 
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FIGURE 'TOWER 
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FIGURE 'M M 0« 
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THE PROGRAM 



We now describe SEE, and how It achieves its goals, by discussing 
the procedures, heuristics, etc., employed and the way they work. 
We begin with several examples. 

Example A. Scene 'TOWER'. This scene (see figure 'TOWER') is 
analyzed by SEE, with the following results; 

WLSULrs 

(BUUY 1. IS 12 13 II) 

(aUOY 2. IS XiS J5 14) 

(SJOiiY 3. IS 123 «17) 

4, IS 16 »7 »aj 



(dOUY 5. 

(dOLy 6, 

(bUi;Y 7. 

<aOUY 8. IS 120 :i9 I2i) 



is tiO <11 :9) 
IS »13 :i4 J12) 
IS ti6 :22) 



Results for scene TOWER 



Example B. Scene 'MOMO' . Details of the program's operation are 
given, (skip to next page. If you wish). 

tz $L SEE W 



Go to DDT and load file SEE 1 (in tape 
GUZMAN F) , a binary dump of the proeram 
SEE. 

Start. 

Read the file MOMO SI (in tape GUZMAN C) 
from tape drive 3. 

Convert MOMO from its Input Format form 
tb Internal Format, the proper form that 
SEE expects. 

Call SEE to work on MOMO. 



$G 

(UREAD MOMO SI 3) fQ 

(PREPARA MOMO) 

(SEE (QUOTE MOMO)) 

Results appear in next page. 

Notes: tZ (control Z) is keyed by striking the Z key while holding 
down simultaneously the CONTROL key. (Memy W'.SJim) 
^ denotes carriage return. 
$ denotes the character "alt. mode". (s«t Jtu i«sf»«K«w i« <.it;>"j) 
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SE£ 58 ANALYZES MOHO 

EVIDENCE 

LOCALEVIDENCE 

TRIANG 

SLOSAL 

((NIC) (0^8) G0044 G0043 &0041 G0040) ((U9) G0046 GOQaS GOett.. 

UOCAL 

(LOCAL ASSUMES <si/> (»9) SAHE BODy) 

(LOCAL ASSUMES («9 Xl7) (»18) SAME BOOr) 

((NIL) (NIL) ((»6)) (NIL) (NIL) (NIL) ( ( «"36 «37 «39) G0043 etc- 

LOCAL ■ . 

(n«3 »2 :i) G0081 G0029 G0030 (50028) ( ( »32 »33 «27 »2o> GOeCt- 

LOCAL 

SMB 

RESULTS 

(BODY 1. U <3 (2 ti) 

(BODY 2, IS t32 133 >27 t26) 

(BOOY 3. IS >2S *31) 

(BODY 4. IS «20 «34 119 «30 t29> RESULTS FOR MOMD 

(BOOY $• IS *96 *35) 

(BODY 6, IS <24 t5 121 14) 

(BOOY 7. IS t25 t23 >22) 

(BOOY S. is tt4 113 SIS) 

(BOOY 9, IS *tO U6 111 (12) 

(BOOY 10. IS 117 lie 19) 

(BOOY 11. IS >7 tS) 

(BOOY 12. IS *38 137 *39) 

NIL 



Most of the scenes contain several "nasty" coincidences: a vertex of 
an object lies precisely on the edge of another object; two nearly 
parallel lines are nerged Into a single one, etc. This has been 

done on purpose, since a non-sophisticated pre-processor will tend to 
make tjiig kind of error - 



Exao^le C. R3. Analysis by SEE gives 



(BOOY 1. IS XI2 Xtl Xt4) 

(BOOY 2. IS Xl6 Xl5 Xt3) RESULTS TOR '^' 

The <^slgn indicates the deztral scenes (cf. page233). The signs 
may be ignored* 
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The Parts of SEE ^^^ program Is straightforward; It does not call 
Itself recursively; It does not do "pattern oistchlng"; it does not do 

tree search. It is formed by several main parts, sequentially execu 
ted. They are 

LINKS FORMATION. An analysis is made of vertices, regions and asso- 
ciated information, in search of clues that indicate that two 
regions form part of the same body. If evidence exists that 
two regions in fact belong to the same body, they are linked 
or marked with a "gensym" (both receive the same new label) .* 
There are two kinds of links, called strong (global) or w6ak 
(local) . 

Some features of the scene will weakly suggest that a group 
of regions should be considered together, aS part of the same 
body. This part of the program is that which produces the 
'local' links or evidences. 

NUCLEI CONSOLIDATION. The 'strong' links gathered so far are ana- 
lyzed; regions are grouped into "nuclei" of bodies, whith grow 
until some conditions fall to be satisfied (a detailed explana- 
tion follows later) , 

Weak evidence is taken Into account for deciding which of 
the unsatisfactory global links should be considered Satisfac- 
tory, and the corresponding nuclei of bodies are then joined to 
form a single and bigger nucleus. 

BODY RETOUCHING. If a single region does not belong to a larger 

nucleus, but is linked by one strong evidence to another region, 
it is incorporated into the nucleus of that other region. If 
necessary, more nuclei consolidation could be done after this 
step J 

A last attempt is done to associate the remainihg single 
regions to other bodies . 

The regions belonging to the background are screened out, and the 

results are printed. 

— -^ ■ ■ 

* In LISP, a "gensym "(generated symbol) is a new Atomic symbol, 
previously unused. 



80 



Avixill arY Routiries 

Three fgnctions are xts&d constantly, and *iil be described now. 

THROUUHCB ^ "Throu^ a chain Of t's." Allows properties or coafigu- 
rations to extend along stral^t lines; for instsmce^ the property 
<<'A' has as neighbor an L, » •" "7 iiap "be extienidefl so as 

to say^throughtes, 'A* has as n^ight^or ax L». 



•'" ■ fj 



*- \ /- -i /- 

schematically represented as ~^ V " *" > 

Strict definition. — 4 | is d^finfeiJ a« one of 



(1) • (meaning the two vertices in botli sides of -f J- are in 

fact the same) . 

(2) • \ ^ V V" *" ' 

matching t*s 

(3) . \ .V^ ^.— _ 

(4) . — ::^ — _>^^-,,^ 

Example a . , \ f See also annotations on listing, 

of -»-• ' \/ / ^ * 

flOOM If a vertex V la ccsttsldered a "good T", (OOOOT ^ is TRtjE; 
false otherwise. 

(GOOOT V) - F if V is not a "T" 

P if ■ ^ ^^fcl^^yojffiJ..- 

■'•'■•^-•v.v.7v:'-:--' 

X if V has a HHCTB. 
F if V* 

saralle 



F if 



parallel 



T otherwise. 
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Aswe s«<,t)(isfiuvctieii trws to distinguish between T's originated by occlu- 
sion, such as 0, and T's originated by accident (A). 

A 



\ 1 ® 



NOSABO ,,„ , , „ 
' — Not same body," Acts as a link inhibitor. 

If consulted, (NOSABO ., V ..) will inhibit, in the following condi- 
tions, the link that vertex V may have created: 



(1) 



V-lnhll 



J 



inhibited link (prohibited, ignored, forbidden, not 

created) 




ARCOW 



(4) 4-Hh 



^ 



KAK 



(5) 



•-H 



^Z. 



Nosabo tries to find condltiona Indicating that two regions should 
not be considered as part of the same body; hence, If consulted, 
Nosabo naay forbid a link among them. Some heuristics place links 
without asking Nosabo' s approval and Nosabo can not "erase "a link 
placed without Its authorlsatltm. 

If none of conditions (1) to (5) Is met, Nosabo will be False, 
Indicating no inhibition was found, and it Is up to the program that 
asked Nosabo 's opinion to lay or fall to lay the link In question. 
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We proceed now to explain in considerable detail each of the parts 
of SEE. This will help the reader to understand the behavior of 
the program, its strengths and deficiencies. 



LINK FORMATION 

Several subroutines are devoted to creating weak and strong 
links. See also Listing, 



F^^^ Removes several unwanted properties. 



EVERTICES, g^^jj vertex Is considered under the following rules: 

L._ No evidence is crerfUi directly by this type of vertex. 

I Nevertheless, the "L" is used in many combinations 
with other vertices to account for evidence. As we 
saw, Nosabo uses L's. "Legs" will use them, too. 
vnvv - == No link iscr«<«<J if any of the three regions is 

r background (but see below) . 
Example (unless otherwise indicated, all examples 
are from figure 'BRIDGE' page 94) : Vertex J 
does not generate links. 
== Otherwise, three links are creaW as shown, except 
that each one may be inhibited by Nosabo. 
Example. Vertex JB only produces link :5-:8. 
Link :5-:9 is inhibited because S is a 'T'; Nosabo 
also forbids link :8-:9 because KB is an 'arrow'. 
This last rule is the most powerful of the heuristics. 
== Two links aretreaW as shown, without asking Nosabo, 




if the fork is connected to the central line of 

an arrow. C^'" "•"* i^ ^"^ ''*''« '/^^ ^ 
Example: In fig. R19, PA generates links :29-:17 
and :35-:17. 
TKisiast heuristic is of help where there are concave objects (Fig. R19). 
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ARROW. - 



X.- 




t 



* Link If an L is coanected to its central lin«, 

and the region shadiid contains only that arrow 

as a "proper-arrow," and no Forks. 

Region :1 contains arrow A m _2 ^ :J 
as a "proper-arrow"; also / 

region :2, but not region :3. Capisce? 

Example. BB links MO with :4. 

Allows "lateral faces" of legs to be properly 

identified and agglntiiiated. 

Otherwise, lii^ except if Inhibited by Nosabo. 

Example. D layis a link between : 26 and :23. 

Powerful and general heuristic. 

No link if the X comes from the intersection 
of two lines. 

Othttrwlse, link as shown except if Nosabo disagrees. 
Exa^le. G orlgiast^fiB links :26-:22 aod : 21-^:30; 
this ls#l; one will later be erased or disregarded, 
since :30 1$ the backer ound. 

No link. 



PEAK. 




MJLTI.- ^ / 



Llnk^ are established between contiguous regions, 

eweept those to ithe region containing the angle 

bigger than 180 °. These links are subject to 

Nosabo Inhlbitlob, 

Exa"ii5>le, In fi«. 'COIIN', JJ generates links 

:8-:9 «nd :9<-:10. 

Of certain use, speiclally with pyramids and 

"pointy" objects. 

No link. 

The reason is: 

(1) If the vertex Is "genuine" (cf . .f<iy«.VV), 
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T.- 




\^ 



although it gaiiiates no links, the object 
having it will probably possess many 
other vertices, through which links 
will get established, and 
(2) if the vertex if "false" because is the 
result of the casual coincidence of two 
or more genuine vertices, mistakes are 
a-voided by abstaining of <3W<'*«ti«<j' links. 
This is generally the case. 
An improvement is possi- | SUGGEST ION | 
ble, by allowing MULTI 
vertices to place links. 
If matching T's, link as shown, without consulting 
Nosabo, Avoid linking to the background. 
Each pair of matching T's produces these links 
only once; that is, we do not produce two links 
while analyzing A and another two at B. 
Do not link if the middle region of a 'T' is the 
background. 

What we are trying to do here is to find places 
where a body appears as two disconnected parts. 




'^^Mi0: 




Link (without Nosabo 's consent) as shown if the 
central segment of the 'T' separates two non- 
background regions, and these have the background 
as neighbor, and part of the separations between 
background and no-background are parallel to the 
central segment of the 'T', 

Avoid double links in the following case (link 
just once) : 
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•- :1iff!*^C^.!i^-Cl '■}■'. 




^-^ 



rf» htCiryt»ni 



«/. itty-H 



*»^ 



\ 



Example. TA llnka :21 with :27 (F-G, 
KA-TA and JA-IA are parallel) . 
Favors occluded bodies with parallel faces. 
"- Also, see "STUDY" In listing, still an 
experimental feature. 
Two linJcB are placed as shown (without asking 

Nosabo) if the central line of the T is 
connected to the central line of an arrow. 
It is of help idiere there are concave objects. 

Table 'Global Evidence' shows cco^actly the main rules Just discussed. 



LOCALEVIBENCE 

— ^^— Weak or local links are laid here; they are used to 

indicate, in a feebler way, that two faces or regions may be part of 

the same object. 

Nosabo can not inhibit local links. 

L 



LEG.- 




[^-.\^^ 



A weak link is placed as shown (dotted) if, 
Throughtes, an L is connected to an Arrow, 
and the two indicated edges are parallel. 

We call this configuration 'Leg'. 
Example (all examples from figure 'BRIDGE', 
except if counterindicated) . Vertex FA is 
a Leg (FA - QB is parallel to EA - DA) 
that links weakly :18 with :19, 

In a Leg, if there are two matching T's as 
shown, a weak link is placed correspondingly. 
Exain>le. In fig. 'TRIAL' (page 88 ), a weak 
link or evidence is placed between :7 and :4, 
because EE is a Leg, and L and E are mctt^iiNj T's. 
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^^^ 



mmmmu^9!imwmBMmmmimummm 



The heuristics described will sonetians produce a "wrong linkage, " 
linking two regions that do not belong to the saae body. These mistakes 
are not likely to confuse SEE, since the handling of these Hides (and 
all of SEE, In general) la done under the irtataaptlon or luatovledge that 
the Information Is noisy and someiAat unreliable. 

Strong links are shown dotted; weak links are not shown. 




(A) 





:1 




o.. 



(D) 





(E) 







(G) 



(H) 



(I) 



TABLE 'GLOBAL EVIDENCE' 
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TRIANGLE.- = A Triangle is a 3-vertex region, of which 

two are Interconnected T'b, the type of the 

other vertex being Irrelevant. 

Two triangles are weakly linked if they are 

(1) facing each other, and 

(2) "properly contained", meaning that D has 
to fall on the same side of AB as C does, 
and similarly for the other vertices, and 

(3) AB is parallel to EF, and AC to DE. 
The heuristic helps with faces of a prism 
that is badly obscured. It does not help 
much, since it gives only a weak link. On 
the other hand , this weakness prevents mis- 
takes when the two triangles are not from 
the same body. i i 

^ I suggestion] 

A possible Improvement ^""""""'""'^ 

consists of choosing the closest of two 

triangles, if several candidates are possible. 

Example. In figure 'WRIST' (page 156), weak 

links are placed between 

triangles 5 and 6, and 

between 1 and 2. |'>v '^ 

Example, Figure 'TRIAL' receives the 

following strong links (full lines) and *' 

weak links (dotted lines) 



FIGURE TRUL 

The program analyzes this scene and 6nds 3 bodies: 

(B0DY1IS«:2:1) 
(BODY 2 18:11:12:10) 
(BODY 3 18:4:9:5:7 3 «:13) 
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FIGURE 'TRIAL' 
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The links could be represented as 




\ ® 




Figure 'TRIAL - LINKS' 

Strong (solid) and weak (broken lines) 
links of figure 'TRIAL'. 

SEE prints these links in the following way: (cf. also p. 110): 

^ . ^ ^ — :11 has four links emanating from Itself, 

((NIL) ((Ul) S0014 GQ013 &0011 GCOlO) ( 
(MS) G0015 G0014 G0C13 G0D12) (013) £0 
021) ((19) G0022 G0021 G0020 G0019 G0017 

GQ016) (ClO) 60015 GQ012 GQOtl GOQIO) 
(i;3) G0034 S<3025 G0C24) l(:4) GuCMC? £00 
32 G0026 biu'jPS G0C23) ((:6) ;^0031 5O03O 

G0029 G0027) (CS) GC026 GuOi-D r:C022 £00 -^ „ , -"tdtat' 

16 S0017) IC?) UuJ3i I.C032 (,0019 GOOie !>"ong Links ot IRIAL 
G0016) |(se) bOJo<i t>(;j;:4 GOO^C) (|I2) GO 
035 £0031 GC029 G0028) ((S14>) MM) GOO 
35 G0G30 G002e S0027) i 



Weak links of scene 'TRIAL' are 



(It2 :i) Cb 12) ce SI) (:4 :5) (:9 :5) 
(tl3 t9) i:3 :8) CQ :d) (:4 '-7 ) CQ :7 
) ( ti2 UO) ( :il :i2> ) 

^There is a weak link between il2 and jlO 



Weak links of 'trIAl'. 
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The next step Is to gather all this evidence and to form tentative 
hypotheses of objects as assemblages of faces with many links among 
them. 

NUCLEI CONSOLIDATION 

All the links to the background are deleted, since it can not 
be part of any body. 

Strong and weak links exist among the different regions of a 
scene. They are consolidated In that order by two subroutines, 
Global and Local. 



Groups of faces with an abundance of strong links among them 



GLOBAL 

are first found; these "nuclei" will later compete for other faces 

more loosely linked. 

Definition ! a nucleus (of a body) is either a region or a set of 

regions that has been formed by the following rule. 

Rule: If two nuclei are connected by two or more strong links, 
they are merged into a larger nucleus. 

More detailed rules appear in page ZS , in section 'Simplified 
view of Scene Analysis'. 

For instance, in the figure below, regions :1 and :2 are put 

,3 

Fig. 'CONSOLIDATION' 
Two links between two nuclei merge them. 

together, because there exist two links among them, to form nucleus 
:l-2. Now we see that region :3 has two links with this nucleus :l-2, 
and therefore the new nucleus !l-2-3 is formed. 

We let the nuclei grow and merge under the former rule, until 
no new nuclei can be formed. 
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When this Is the case, the scene has been partitioned into 
several "maximal" nuclei; between any two of these there is at most 
one link. For example, figure 'TRIAL-LINKS' will be transformed into 
figure 'TRIAL-NUCLEI'. 



6 





Figure 'TRIAL - NUCLEI' 
Maximal nuclei of scene TRIAL. 



LOCAL 



If some strong link joining two "maximal" nuclei is also 
reinforced by a weak link, these nuclei are merged. 

The weak links of figure TRIAL are shown as dotted lines in 
figure 'TRIAL-LINKS' (page 90); they transform figure 'TRIAL-NUCLEI' 
into figure 'TRIAL-FINAL'. 



6 





Figure 'TRIAL - TINAL' 
Nuclei of scene TRIAL after merging 
suggested by local links. 



BODY RETOUCHING 

Additional heuristics assign unsatisfactory faces to existing 
nuclei, or isolate them. SINGLEBODY and SMB are used for this task. 
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SIMGI£BODY ^ strong link joining a nucleus and another nucleus composed 
by a single region is considered enough evidence to merge the nuclei In 
question if there is no other link emanating from the single region. A 
message Is printed indicating these merges. 

Such rules produce no change in fig. 'TRIAL-FINAL', and there- 
fore its nuclei will be reported as bodies. 

A more complex example shows the retouching operation. Figure 
'BRIDGE' undergoes these transformations: 



Scene BRIDGE 



Fig. BRIDGE 



M 
CO 

X 
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H i3 a 

Weak and strong links among regions 



•H a 



O 60 o 

■" Maximal nuclei 






(2 or more strong links) 



•H -H 

4^ .-I C 
« 60 O 
4) 60-r4 



O 



Maximal nuclei enlarged 
by weak link action 


Single 
region, 
single 
strong 
^link. 


I 

r CO Q 



Id. enlarged 
by single undisputed regions 
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U no 41 
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Id. enlarged 
by good neighbors, "goodpal". 
Final result. 



Fig. 'LINKS-BRIDGE' 



Fig. 'NUCLEI-BRIDGE' 



Fig. 'NEW-NUCLEI-BRIDGE' 



Fig. 'FINAL-BRIDGE' 



Fig. 'FINAL-BRIDGE' 
(no change in this 
case) . 
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FIGURE 'BRIDGE' 
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We see that in figure 'NEW-NUCLEI-BRIDGE' , nucleus :16 Is merged 
by SINGtEBOuy with nucleus :18-19 (see figure 'FINAL-BRIDGE'). Nucleus 
:28-29 Is not joined with !26-22-23 or with :24-25-27-12-21-9. Even if 
nucleus :28-29 were con^osed by a single region, still will not be 
merged, since two links emerge from It: two nuclei claim its possession. 

This rule joins single regions having only one possible "owner" 
nucleus. 

— ^ Two systems of links are used by SEE. One consists of weak and 
strong links, produced by examining each vertex, and culminates forming 
nuclei under GLOBAL, LOCAL, etc. 

The second system constitutes a different network of links; SMB 
works in the second system. It is motivated by the desire to collect 
evidence not directly available through the vertices. It gathers 
evidence from the lines or boundaries separating two regions, in an 
effort to answer the questions Are two given neighboring regions part 
of the same object, or are not they? That is, are two contiguous regions 
"good neighbors" ("good "pals")? If they are, a special link, S'-link . 
Is placed, eventually forming a network Independent of weak and strong 
links, that will collapse, in a somewhat peculiar way. Thus, a great 
amount of unnecessary duplication could be possible in the Information 
carried by both systems of links. To reduce it, the s-links are designed 
to conqplement and extend, rather than to re-do, the agglutination 
produced by weak+strong links. They (the s-llnks) will, therefore, mainly 
study single faces not satisfactorily accounted for. 

SMB uses the predicate (GOODPAL R S) , which acquires the value T 
(true) if R and S are two contiguous "good neighbors" regions. 
To satisfy this, their common boundary must not be ei]q>ty, and must 
lack L's, FORKS, ARROWS, K's, X's, PEAKs, MDLTls. In addition: 
R \ = Not good: (GOODPAL R S) = F 



R \ _ /^ii = Not good: (GOODPAL R S) = F 

~s 



\ /^v 

L" or (in general) vertex that makes 



1\ 

I *- "L" 



(NOSABO R S) to be true. 
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=- 0. K. otherwise: (GOODPAL R S) = T. 
In particular, 

/ ^-\ 

Is 0. K. If (NOSABO R S) = F. 

SMB analyzes the naclei formed tinder weak+strong links that, after 
SINGLEBODY actuation, still remain formed by a single face or 
region. The steps are: 

1. A network of s-links is formed by putting a s-link between regions 

forming a nucleus all by themselves, and their goodpal neighbors. 

2. If exactly one nucleus is s -linked to one of those regions (that 

is to aay, if such single -region single -nucleus has precisely 

one good-pal), the region gets absorbed by the nucleus; otherwise 

the region is reported as a body in itself (consisting of a single region) 




1 Vm-wSl^'/!"?. 



I 1 K^^^»»» w wiww»t^* *4 3 } 



© 
© 




5y does not change becMint :3 has two a -links. 
Note that 

a. The s-links are not used to form nuclei as the weak+strong links 

were; they only help certain isolated faces to ioin bigger 
structures. 

b. Two s-links between two regions have the effect «£ ©ne. 

ExaBq>le. In figure 'HARD', regions :6 and :7 get joined by Slffi. 
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SEE 58 ANALYZES HARD 

EVIDENCE 

LOCAUEVIDENCE 

TRIAN6 

CtOBAU 

((NIU) ((*94n ((*6)) ((t36)) ((<24) 60026 

0044 00043 C0042> (itl7) (>0047 6004« 60045 

0041 60039) ((121) S0050 60040 G0039 60029 

003S 60036 60019) ((<26) 60054 60053 60037 

60055 60023 C0020 60015) ((t32) 60057 60056 60034 
8 60048) ((14) 60058 60048) (OlO) 60059 60032 60031) ((* 
119) 60064 60063 60062 60061) (ft20) 60t)64 et)062 60060 60. 
130) 60056 60035 60033 60016) ( ( 1 15) (»0066) <(tl6) 60066) 
((NIU) ((t34)) ((t6)) (((36)) (NIC) (NIL) (NlU) (NIL) ( ( t* 
019 60053 60036 60054 60038 60037 G0019) (NIL) ( ( »24 122 
0040 60039 60029 60028 60027 60024 60022 60o5S 60023 60C2 
) (NIL) ((15 <4) 60048 60058 60048) (NIL) ((ti3^ tl7 M4) i 
*18 *19 120) 60060 60064 60063 60061 60064 60062 60060 60 
132 131 *30) 60033 S.0057 60034 60056 60035 60033 60016) ( 



60025 60023 60«tc. 
60044) ((I7))«te 
60028 60027) ( 
60036) ( (127) - 
60033) 



LOCAL 

(LOCAL ASSUMES 
(LOCAL ASSUMES 
((NIL) ((*34)) 
019) ((<24 t22 



(tU) (U2) SAME BODY) 

(«15) (»16) SAME BODY) 

((16)) ((t36)) (NIL) (NIL) ((t7)) (NIL) (N 

t3 t23 121 128 129) 60020 60026 60025 6004* 



0055 60023 60020 60015) (dl 12 133) 60052 60051 S0017 60' 
43 60047 60046 60044 60047 6004S 60043 80042) (NIL) (dlS 
<10 (8) 60032 30032 30065 60059 60031 60030) ((i32 >31 t; 
> (NIL) ((*3S)) ((<12 111) 60067) (NIL)) 
LOCAL 

(((<12 Ml) 60067) ((tl6 >15) 60066) ((132 (31 *30) 60033 
60065 60059 60031 60030) ((*18 (19 (20) 60060 60064 60063 
6 60044 60047 60045 60043 60042) (((5 (4) 6o048 60058 600< 
3 (21 (28 (29) 30020 60026 60025 60049 6004t 60021 60050 ( 
15) (((25 (26 (27) 60019 60053 60036 60054 60038 60037 601 
LOCAL 
SMB 

(SMB ASSUMES 
RESULTS 

(BODY 1. IS (12 (11) 
(BODY 2. IS (16 (15) 
(BODY 3. IS (32 (31 (30) 
(BODY 4. IS (9 (10 (6) 
(BODY 5. IS (18 (19 (20) 
(BODY 6. IS (13 (17 (14) 
(BODY 7. IS (5 (4) 
(BODY 8, IS (1 (2 (33) 
(BODY 9, IS (24 (22 (3 (23 
(BODY 10. IS (25 (26 (27) 
(BODY 11, IS (7 (6) 
NIL 



(7 (6 SAME BODY) 



RESULTS FOR HARD 



(21 (28 (29) 
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RESULTS. After having screened out the regions that belong to the 
background, the nuclei are printed as "bodies". 

In this process, the links which may be joining some of the 
nuclei are ignored: RESULTS considers the links of figure 
'FINAL-BRIDGE', for instance, as non-existent. These links 
are the result of imperfections in the heuristics, mistakes in the 
placement of links, and may point out different parsings. An 
inq>rovement to SEE will be to try to "explain" these residual links. 

Sumnary g^^ ^^^^ ^ variety of kinds of evidence to link together 
regions of a scene. The links In SEE are supposed to be general 
enough to make SEE an object-analysis system. Each link is a piece 
of evidence that suggests that two or more regions come from the 
same object, and regions that get tied together by enough evidence 
are considered as "nuclei" of possible objects. 

Examples and discussion are in next section. 
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ANALYSIS OF MANY SCENES 

Until we have an adequate analytic theory, the behavior of a 
heuristic program Is best understood with examples. There are 
several ways to go about this: 

Simple _ , 

— ^^^ In order to learn what a program does, simple examples, each 

one illustrating a single feature or group of features, are very 

appropiate. 

Favorable .... 

— — — — A shiny impression of a set of routines is obtained by 

presenting 'favorable' cases, designed to enhance the characteristics 
of the program in front of the unsophisticated observer. 

Of course, of all possible inputs, there is a subset that will 
produce outputs very pleasant in terms of speed, easiness of pro- 
gramming, generality, accuracy, or whathever other feature that sys- 
tem advertises. This subset tends to get the highlights in the 
descriptions. 

Nasty 

1—. Examples in which the program does particularly poorly are 

useful, if well chosen, to illustrate the weak points and pitfalls 

of the techniques used, the restrictions and constraints in the input, 

etc. They may point out Improvements or extensions. 

Silly „ , u . 

— ^^— Examples having very weak connection with the purpose or 

intention of the routines or algorithms discussed serve no useful 

end, except perhaps to point out that the maker of such examples did 

not understand the issues. For instance, one could take a box full 

of pins, drop them on the table, take their picture and ask SEE to 

work on it. 



A collection of simple, favorable, and nasty examples follows. 
They are not In that order. 

A discussion is found at the end of this section. 
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stereo Scenes ^^^^^^^^ ^f gtereographic pictures will be found in 
the section 'Stereo Perception'. 

Finding th e background gj^g^p^eg „here the background is not known 
in advance and has to be deduced are given in the section 'Background 
Discrimination by Computer' . 
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LIST OP SCENES ANALYZED BY SEE IN THIS SECTION 

PAGE 



Name. 


Comments . 


Scene (: 


R17 
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108 
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110 


111 


R3 
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114 


SPREAD 


116 
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STACK* 


119 
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TRIAL 
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Scene R17 ^^^ ^^^^^ prisms are found. In scenes like this, the 
position of one or two vertices may alter the analysis made by SEE, 
by changing radically the slope-direction of a small segment (such 
as KL and GH, figure 'R17'), killing several T-joints and separating 
regions :l-2 from t5-6. 

Small errors in the coordinates of vertices K, L, G, H, and few 
others will drastically change the slope of segments of short length. 
This will transform G and K to be Arrows or Forks , so that G and K 
will no longer be matching T's (cf. also ' Conservation. and Toleranc*' 
page 173). As a consequence, body i2-l will be disconnected from body 
i5-6. This annoying problem Is not difficult to correct, at preproces 
sor level, since there is good Information about the slope of the 
(long) line BN : the slope of KL has to agree with the slope of 
BH, giving a good estimate of its true shape. The | sOGGESTIOn] 
rule seems to be that these short segments should be 
"re-orlented" if necessary, to agree with the longer ones, which are 
more reliable. Deeper analysis is found in section 'O n Noisy Inpu t'. 

The preprocessor should consider the hypothesis | SUGGEST ION \ 
that BKLN are colinear ~ or SEE should propose it 
for confirmation (see 'Division of Work In Computer Vision', p. 60 ), 

The "i signs ^.^ ^.^^ printouts of some scenes, such as R17 (see 'RESULTS 
FOR R17' in page (01), a ^ sign appears as part of the name of every 
region and vertex; that Is, tt3 instead of :3. This will be the case 
in all scenes having names starting with the letter R, differentiating 
the "right regions" from the "left regions". This will become clear 
m the section 'Stereo Perception', page ^33 ; until then, disregard 
the I's. 



107 




FIGURE 'R 1 7' 

The three prisms were correctly found. 
There are several "nasty" coincidences 
In this scene, simulating the data 
that a not-too-satisfactory preprocessor 
will tend to provide. 
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II- without difficulty, two bodies are found. Each region 
contains four strong links relating It with other regions (see 
'RESULTS FOR L3'). LOCAL is not needed to form nuclei; neither 
STSGLKBOin or SMB. 

Explanation of the printout produced by the program _ .j, 

printout of the results appears. The format is the same for every 

scene. It starts by saying 

SEE 5d ANALYZES L3 

which identifies the name of the program (SEE) , its number (version 

number 58), and the scene to be analyzed (L3). 

EVIDENCE 
LOCALEVtOENCE 
THIANC 
GLOBAL 

The different sections of the program print their name, when they 

are entered. 

We then come to a list containing regions (such as :6) and 'gensyms' 

(such as G0009): 

((NIL) ((16) 60009 C0007 60005 600.04) ((<&) 60010 60006 
60007 60004) ((t4) 60010 60009 6000S 60005) ((<i) 60015 
60013 60012 60011) (((2) 60016 60014 60013 60011) 
((>3) 60016 6001S 60014 60012) ((>7)>) 



This list contains the nuclei and the litdcs (strong links) ; the first 
nucleus that we see is (('6) 60009 G0007 GOOOS G0004) i meaniNg' 
that from nucleus (or region) t6 emanate four links, namely &)009, 
60007. 60005 and G0004. We can represent this graphically t 




The total representation of the above list is thus 





We then see "* LOCAL" ^hen this function is entered, it prints its 
name) , then the list of nuclei again, this time shrunk somewhat by 

LOCAL; finally, we see "RESULTS", and then 2 bodies, follo- 
wed by NIL, meaning the end of the program. (See page 112), 
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Scene R3 ^^ bodies are found in this scene. Vertex T is 
classified as of type 'T' , hence only one link there exists between 

;2 and ;4. 

All scenes have regions, vertices and lines (edges) joining 
vertices and separating regions. We generally omit the nanes Of the 
vertices from the drawing (figure 'R3'); we are also omiting the 
coordinate axes. 

Since each region has an inside and an outside, the following 
are invalid or Illegal configurations in a scefte: 




A line ending nowhere s Illegal. 




Our scenes should be such that, 
t« disconnect a separate compoftent 
of the graph into two coiq>onent8> 
Ve have to remove (delete) at least 
two edges. The graph above is 
"illegal" as input to our program, 
since the criterion is not mett 
removing edge B will disconnect 
the graph (cf. page ^1 ). 

incidentally, some optical 
illusions are "recognized" or rejec 
ted because they come from illegal 
scenes of the type just described 
(cf. section 'Optical Illusions'). 

See 'Illegal scenes', page i 17, in section 'On noisy input.' 
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R 3 





FIGURE 'R 3' 
A scene analyzed by the program. 
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Scene SPREAD „ , ,, ,„ 

■ ■ Body i41-42 was found; also :8-18-19, In the first 

case, there was one strong link between !41 and t42, because of the 

heuristic (g) of table 'GLOBAL EVIDENCE' (page ST"), and SINGLEBODY 

completed the object. In the second case, heuristic (g) could not 

be applied, and SMB had to join :19 with :18. 

Bodies :29-30-31-32 and :25-26-27-28 are adequately found. 
Also the badly occluded long body : 10-9-11-12-3 is found. 

Body :21-6-25-20 Is found as one body. An older version of 
SEE {Guzman FJCC 68> used to report two: j6-21 and :5-20, The 
change is as follows: one link is fttcerf between i6 and :5 because of 
the matching T's, the other link is a weak one (Jaced because :5 and t20 
form a LEG; a weak link is also placed between :6 and :5, 

J24 gets reported isolated, instead of together with :22-23, 
because no Leg is seen; but see comment (page 3a) in section 'Sim- 
plified View of Scene Analysis ' . 

SEE tries to find a "minimal" answer; minimal in the sense 
that it will try to explain the scene with the minimum possible num- 
ber of bodies (cf. section 'The Concept of a Body'), That is the 
reason which joined :41 and :42 in one body. Instead of two, which 
is another possible correct answer. That is also true of : 19-18-8, 
interpreted as one parallelepiped with a vertical face (:19) and an 
horizontal face (sl8-8). 

The background of SPREAD is also computed (see page 226 of section 
'Background Discrimination by Computer'). 
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SPREAD 




FIGURE 'SPREAD' 

Bodies : 10-9-11-12-3 and : 6-21-5-20 are properly found. Also is 
correctly identified the body : 19-18-8, which is a parallelepiped 
with a vertical face (:19) and an horizontal face (:8-18). 
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Scenes STACK and STACK* ^^ ^^^j^ ^^^^^ ^^^ ^j^^ ^^^^^^ ^^^ accurately 

Identified by our program, which is written in LISP. In both cases 
the body t4-15-l6 is foi^nd. 

These scenes show that in many instances one could drastically 
alter the position of a vertex, without modifying the output of SEE 
(compare figure 'STACK' with 'STACK*'). 

Other examples would show that the vertices of type 'L' can be 
arbitrarily displaced, so long as their type remains 'L' and other 
vertices do not change type, without detrimental effect. This dis- 
placement may possibly affect some heuristics that use concepts of 
parallelism or colinearlty, but not the rules that use the shape or 
type of a vertex (cf. table 'VEBTICES', page 69) for placing and 
Inhibiting links. Sead 'Misplaced vertices' in page2\l , in section 
'On noisy input.' 
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za 



aa 




FIGURE 'STACK* 

Every body is correctly identified. Con^are with scene STACK*, 
This pair of drawings Illustrate the fact that it is often 
ppssible to disturb the coordinates (the position) of a vertex, 
without introducing errors in the recognition. 
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STACK 




FIGURE 'STACK*' 
Every body Is correctly found. Compare with scene STACK. 
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Scene LIO ^^^ concave object :ll-15-14-7-6 presents no 
problem, since there are plenty of visible vertices 
(figure 'LIO'), and SEE makes good use of them. 

SIN6LEB00T is necessary to join regions sl3 and 

:2. 

The bodies of a scene do not need to be 
prlsmatis in shape, nor convex. Their vertices could 
have errors in their two-dimensional position. Table 
'ASSUMPTIONS' (page 255) specifies the suppositions that 
our program obeys. 
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L 10 




FIGURE 'L 1 0' 

Slnglebody had to join :2 with :13. 

All four bodies were happily Identified. 
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Scene RIO 



Four bodies are found by our program In RlO. 



The scene is a good example of a "noisy" scene. In vfalch edges that 
should be straight look crooked. This Is because the coordinates 
of each vertex are "lin>recl8e" ; the vertices have some error In 
their coordinates. Other scenes also show this tendency; they 
accurately represent the data analyzed by SEE (the scenes In their 
final form were drawn by program, then inked manually) , and should 
not be considered as "sloppy drawing Jobs". 



(1) 
(2) 



SEE has several ways to cope with these In^erfectlons: 

tolerant definitions of parallelism and collnearlty, 

Insensltlvlty of heuristics to displacements of the vertex. 
For Instance, vertex V will inhibit the link that Z proposes, 
either ^en V is of type 'Arrow' or when it is of type 'T' 
(but not \Aien. 'Pork'): _ _^^ 2 





(3) Large variations in the coordinates of a vertex are possible 
before that vertex changes type. Vertex of type 'T' are an 
exception, changing into a Fork or an Arrow by » small displa- 
cement . 




Amouf 




fo« 




Nevertheless, it is possible to "straighten" these vertices, 
by following the suggestion in the connents to scene R17. 

The section 'On Noisy Input' deals with these matters. 
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FIGU8E 'R 10* 

The scene coat^jBrias "noisy" ycrtl«»a; hence, some 
edges look bent. S^ ha# Tesourcwi Kr sope with these 
probleas. 

Tigures XIO and KLO fem a atenio pair. In tigure 
'LIO - RXOV In pagai^, ItrfonaieMrt:: ffWI both ^^enes 
is con^iiiad to fiat tlte |>o«itloii of eh«i» objeptf; in 
three-diwmalaaal fpaee. See a^gtbi tStSreo^^e^eption', 
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Scene TOWER ^^^.^ ^^ ^^ ^^^^ ^.^ ^^^ ^^^ of lqCAL or SINGLEBODY 
in this scene, since there are plenty of global (strong) links 
among the different regions. : 18-22 and -.17-23 get links thanks 
to the heuristic that analyzes vertex of type "X". 

There are several "false" vertices, formed by coindicences of 
edges and "genuine" vertices: the vertex common to :9, 11, 12 and 13; 
the one common to :2, 4, 5, 6. They do not cause problem, because 

(1) in the case of the vertex conmion to :9, 11, 12 and 13, it is of 
type "MULTI', and no link is laid. 

(2) In the case of the vertex shared by regions :2, 4, 5, and 6, 

it is an "X" that will establish one link between :A and :5 (which 
is correct), and another between :2 and :6 (which will do no 
harm, since we need two "wrong" or misplaced links to cause a 
recognition mistake) . 
Coopare with scene 'BEHOT'. 
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TOWER 




FIGURE 'TOWER' 

A "wrong" link is placed between :2 and :6, 
without serious consequences. Results for 
tills scene are in "RESULTS FOR TOWER' . 
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Scene HEWOT 

—————— This scene (see figure 'REWOT') is the same as the 

scene TOWER (see figure 'TOWER'), but upside down. The program 
obtains identical results for both scenes (see 'Results for Tower' 
and 'Results for Rewot'), because SEE does not use information about 
a body supporting or leaning on another body. For Instance, it 
was not assumed that body il-2-3 is partially siq>porting (in figure 
'TOWER.') body i4-5-15; clearly this assumption fails in case of 
figure 'REWOT'. But since the assumption is not followed, the pro- 
gram succeeds in both cases (gives same results). 

See table 'ASSUMPTIONS' {page 255) for suppositions that the 
program makes or presumptions that it does not need. 

The regions :16 and i24 had to be marked as part of the 
background, following standard practice (cf. 'Input Format'). 
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FIGUKB 'R E W T* 

This scene Is the sane as the scene TOWER, 
but with Y replaced by 100. - Y, and 
X replaced by 100. - X ! It is upside 
down. SEE still finds eight bodies. 
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SEE 58 ANALYZES RENQT 

EVIDENCE 

LOCALEVIOENCE 

TRIAN6 

CtOBAL 

<INILJ (<I20» 60134 60133 60132 60131) (<tl9) 60137 60135 60134 6 

ni?ii i^llV ''^^'^ **°*" ®°*=*^ ^0»"l (C*l6r6013S 60136) {1»22) 
6013* 60136) (1 123) 60141 60139) |(*10) 60196 60144 60143 60140) 
nUl) U0156 60143 60142) U 113) 60157 60147 <014S) (( 114) 6015S 

S2^«.*?^*«.Sl2**** "**' ^°^^^ ''01»* ''Ol'l 601S0 60146) ((12) 601 
62 60161 «0155 80153 60152) (<il2) 60156 60146 60145) ((17) 60161 

ifS ?r»?Ji*5' '1*«! "1** 60142 64)140) ((««) 60159 60150 60149) ( 
i^?i«^Si?S ®°*2' 60126) ((*t6)) ((117) 60141 60139) ((MS) 60163 

S?*?2.S5"** *'•*' '=<'**' «''»*0 60130 60126) ((13) 60164 60162 601 
54 60153) ((124)) ((H) C0164 60155 601S4 60152)) 

i!ni'"r«l!Il4''Jl-f!!li''* '*'*° M9 «21) 60132 60J35 60134 60131 60137 6 
*:?. °*f? 60132) (NIL) ((116 «22) 60136 60138 60136) (NIL) (NIL) 

oiSi'-Goi;i'-L!!!i'r\<?.*VM''*^* (<«i3 *i4 .iz) 6oi4B eoi57 eo 47 e 

5J fn?2nt ,^?i fi '^^'-' *<*" *^* *•' "140 60156 60143 60144 601 
42 (.0140) 1(*6 17 t6) 60161 C0150 60151 60146 60159 60150 60149) 

\'^r, rrliJVLi^t^^ **^' '*>"' 60141 60139) (NIL) ((»15 «5 14) 60 
J^n?2i^ '•4*^ "^^^^ '*""« 60126) (NIL) <{I24)) ((I2 13 U) 6016 
1 60152 60162 60153 60164 60155 60154 60152)) 

i!^i'"J!«!!IliV*/*f*'' **' ***' *°*" "*'S 60134 60131 60137 60135 6 

.?? ??i?*'- '-** *"* ^''*=** "^=*» 60136) (NIL) (NIL) (NIL) ((113 

*14 >12) 60149 60157 60147 60158 60146 60145) ((MO HI *9) 6014 

?J°n«f-S°M' ^°^** 60142 60140) (|t6 17 16) 60161 60150 60151 60 

148 60159 60150 6014») {(116)) J(I23 .17) 6oi3« 60141 6013« ((*! 

.!«*!' '^S*^'' ®°*2« '^*'*®' '^Oi*'* 60130 60128) ((«24)) ( ( «2 13 «1) 
60161 60152 60162 80153 60164 60155 60154 60152)) ' " ' * *' 
LOCAL 

Ii'!S l^.'V 6°*** 60162 60162 60153 6(^164 60155 60154 60152) ((t 
15 15 14) 60130 60129 60163 60160 60130 60126) ((t23 «17) 60139 6 
0141 60139) ((Id 17 18) 60161 60150 601SI 60148 60159 60150 60149 
> ((>10 111 19) 60140 60156 60143 60144 60142 60140) (013 114 tl 
2) 60145 60157 60147 60156 60146 60145) ((116 122) 60136 60138 60 
136 ((120 119 121) 60132 60135 60134 60131 60137 60135 60133 601 
32 ) ) 

LOCAL 

SMd 

RESULTS 

(BODY 1. 18 12 «3 tl) 

(SOOV 2, 18 115 15 14) 

(BODY 3. 18 123 117) 

(BODY 4, IS 16 17 18) RESULTS FOR REWOT 

(BODY 5. 18 110 111 19) 

(BODY 6. 1$ 113 tl4 112) 

(BODY 7, IS 118 122) 

(BODY 8, IS 120 *19 <2i) 

NIL 
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Scene WRIST* _. .. ^ , ^^ ^^-^ . „ , 

^^^^^^^^— The concave objects are properly Identified. W places 

a link between i23 and :4, and another between :30 and :4. CC does 
not Inhibit the link between tl7 and tl9 ordered by the Arrow HA, 
because MOSABO was never called, since the first rule of 'ARROW' 
(page t'f ) ^^B applied. 

The only mistake was that objects :9-7-6 and tlO-5 should be 
fused and reported as only one. There Is a link between :9 and :10 
put by heuristic (g) of table 'GIJOBAL EVIQEHCE' . It Is not enough. 
There Is also a weak link between 'Triangles' t5 and :6. OB Is not 
a 'Leg', so there Is no weak link between tlO and t5. The situation 
Is as follows (see chains of links In 'RES1ILTS TOR WRIST*; how to 
read these chains Is explained In page \\0 , 'Explanation of the print- 
out produced by the program' ) : 





tie and :5 will get Joined later by SIHGLEBODY. 

Almost the same thing occurs with :l-2-22-21, but In this case 



vertex A produces one strong link between 22 and 21, and vertex R, by 
heuristic (g) of table 'Global Evidence', also links 22 with 21. This 
Is enough. 





© 
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WRIST 




FIGURE 'WRIST*' 

Instead of one, two bodies were found in :9-7-6 and :10-5 
Insufficiency of links was the offending reason. All other 
objects were correctly found. 
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T'!^!^S^'^gmrr--^s^-!r!g^sisf!^^^^^^Wr^.~ . v 



Sce nes L2 and R2 _ ,, ^ , 

■ Two objects are found, as expected. 

These scenes form a stereographic pair: two pictures taken from 
the same scene from slightly different locations, mantalnlng parallel 
the optical axes of the cameras, and the same magnification. A pro- 
gram, not yet completed. Is designed with the following ideas: 
Left and right pictures are independently processed by SEE; L2 and 
R2 In this example. The answers are 

ANALYSIS OF L2 ANALYSIS OF K2 

(BODY 1. IS :2 :4) (BODY 1. IS %:1 X:2 %:4) 

(BODY 2. IS :1 :5 :3) (BODY 2. IS %:3 %:6 X:5) 

The question is now: Is body :2-:4 the same body as Z:l-%:2-%:4, 
or is it %:3-%:6-%:5 ? It is required, after decomposition of the 
scene Into bodies, to match the left bodies with t^e right bodies. 
If this is accomplished, one could then locate the figure in three 
dimensional space, from the two-dimensional coordinates of the figure 
in the left and right scenes. 

In this way it will be known where these objects are located in 
the "real world". 

This "matching" mentioned above Is complicated as follows: 

" It is possible that the number of objects observed in one view 
is different from the number in the other. 

— On a given object, it is possible that SEE will make a mistake 
in the left view, but not in the right view; as a consequence, 
two bodies on the left have to be matched with one on the right. 

If the two axes of the camera are on an horizontal plane, a vertex 
in the left scene and its corresponding vertex in the right scene 
(if visible) will have the same y-coordinate, such as H in L2 and 
%I in R2. Other known relations exist, derived from the relative 
position of the 2ixes of the camera, magnification, etc. See section 
'Stereo Perception*. 



138 



R 2 




FIGURE "R 2" 
Two bricks are found . 
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L 2 




FIGURE 



'L 2" 



Even if (possibly) a face of object :4-2 is missing 
in this case SEE makes the correct identification. 
Section 'On Noisy Input' deals with Imperfect 
information. 
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Scene L19 



The small triangle :15 just could not get joined with 



the remainder of the body :16-20-19, and two objects were found. 
There is a weak link between :15 and :19, but it did not help since 
there is no link between :15 and :16. What happens is that regions 
:1, :15, :13 and :22 all meet forming a vertex of type MULTI; this 
vertex should (in some future version of SEE) be split into two, sin 
ce both :1 and :37 are the background- The rule for this splitting 
seems to be . . • .•;.. 





:11 was joined with :4. but Isolated from j 12-27-5. There are 
no T-Jolnts between these two nuclei that could give 'hints' (1. e., 
links) for their unification. 

The two large concave objects were properly isolated. 

Compare with R19 and WRIST*. 
See 'Merged vertices', page 22/ In section "On noisy input.' 
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Lig 




FIGURE 'L 1 9' 



It was easy to find : 6-7-8-9, the hexagonal prism, 
:15 was reported as a sjingle object: a mistake. The two bia 
concave objects were appropiately identified. 
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Sc ene R19 . ^ ,,„ , 

As ±n L19, here the triangle :27 is detached from 

: 5-32-33, two bodies being reported. There is no strong link between 

:27 and :33. There is a weak link between :27 and :5, because both 

are 'triangles' facing each other, but that is not enough. A weak 

link is never enough. 

All other bodies are properly found, including : 10-16-2-3. 

Vertex RA, of course, contributes with no Hides. The situation 

could change if we discover that RA is a false vertex, l„.„„,„„^ — I 
... . ^ IsuggestionI 

that is, one composed by the merge of two genuine ones.''—— • — 

There is enough enformatlon, I think* ■tnca i34 and i37 are btiipDund, 
and this will suggest a way to "divide" vertex RA into two simpler 
ones. This idea of dividing vertices of type MOLTI into simpler 
ones should be applied with caution, since there will be genuine 
vertex of type HDLTI {vrtiich should not be split). The main use of 
this technique will be for he1pift» single regions to join some other 
body, a task performed now, not too satisfactorially, by SMB. 

Compare with L19 and WRIST*. 
See 'merged vertices' , page 22 1 . 
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Scene CORN ^^ pyramid : 8-9-10 was easily identified because a vertex 
of type PEAK produces many links. In the bottom, bodies : 1-2-3-4 and 
: 12-13-11 were separated, because the fork between :4 and :12 has the 
background as a region, and did not contribute with any links. Cer- 
tainly, this is a psslble interpretation. Another interpretation is 
to regard the object : 1-2-3-4-11-12-13 as a prism with the shape 
of a "C". 

SINGLEBODY was needed to join :4 with : 2-3-1, the only link 
being placed by heuristic (g) of table 'GLOjJAL EVIDENCE.' 
The program knows that :22 is the background. 
If we could see the hidden vertex KK (if it indeed exists) , 
two links would be put and we will have had one body: 
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FIGURE 

The pyramid at the top was Identified 
properly. Two bodies were found at 
the bottom, which is a plausible 
interpretation: : 1-2-3-4 and : 11-12-13. 
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Scene L9 



Here the tolerances SINTO and COLTO that allow for 



"sloppy parallelism" have made T's out of NA and FA. Therefore, 

these vertices do not contribute any links for tl. Moreover, the 

"T" PA Inhibits the link suggested by QA between :1 and :8. 

That being all, tl gets reported as a single body (see next page). 

By decreasing the tolerances, correct Identification Is possible 

(see the correct Identification In page 155). 

See 'Tolerances in collinearity and parallelism', page2.1£ . 
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Scenes R9 and R9T Four bodies are found inR9, five in R9T. The 
difference is that Y and JA (see figure at bottom of this page) are not 
"matching T's"in R9T. The strong links among :12, :3, :10, and :16 are: 





LINKS FOR R 9 



LINKS FOR R 9 T 



In R9, the two strong links (G0030 and G0021) between :12 and :10 
were put by the matching T's Z-EA and Y-JA; of the two strong links 
between :10 and :16, one was because DA is an arrow; the other, 
because EA is a "T" for which heuristic (g) of tab.le 'GLOBAL EVIDENCE' 
applies . 

But in scene R9T, not having Y and JA as matching T's, a link 
between :10 and :12 disappears; and also nuclei :16 and :10 can 
not be linked by heuristic (g) of table 'GLOBAL EVIDENCE'. SEE deci- 
des to report two bodies there: :3-12 and :16-10 instead of one 
as in scene R9 . 




Are Y and JA matching 
T's or not? Different 
answers produce different 
analyses of the scene. 

These scenes show that the analyses can be quite sensitive to 

the "right" definition of parallelism and colinearity. 
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R9 




FIGURE 'R 9' 

The four bodies were found. 
SINGLEBODIES was needed to join :18 
with : 6-11-1-4-2. 
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R9T 




KLGORE 'R 9 T' 

SINGLEBODIES Joins : 18 with the 
other portion of that body; UOC&L 
is needed to join i6 to that 
portion, mod iV6\Aiik 1 10. 
Nevertheless, slnpe ::1,2 and :ld were 
not found to be t^e s4Me~ faee, body 
: 16-10 Is foimd, and Body :12-3. 
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Scen e TRIAL ^^^^ ^^^^^ ^^^ ^^^^ analyzed in great detail in the 
section that describes the program SEE. Its links are found in 
graphic form in figure 'TRIAL - LINKS', or in written form (lists) 
in "RESULTS FOR TRIAL". 

LOCAL had to join :13 with the remainder of that body. 



CO 



"^ 



161 



TRIAL 




FIGDRE 'TRIAL' 
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Scene ARCH g^^ analyzes scene ARCH (see figure 'ARCH') with results 
displayed in 'RESULTS FOR ARCH'. This is an scene composed of many 
degenerate views of objects. It Is an ambiguous scene (see section 
on Optical Illusions), in that several good interpretations are po- 
ssible. 

The program reports :7 and :17 as one body, which could be plau 
slble. :16, :9 and :10 get reported as independent objects. In 
the scene from where this picture or line drawing was taken, :7, :17 
and :16 were the vertical face of an object. :10 was the vertical 
face of another, :9 being its horizontal (top) face. In cases like 
this, in order to choose the "right" one of several possible inter- 
pretations, more information has to be supplied to the program, such 
as lighting, textures, color, etc. 

No link was put by A between :3 and :29, or by UB between :5 and 
:19, because D and W are GOODTs . In one case, G provides with more 
links and causes : 3-8-29-31 to be reported as one body, which is 
correct; in the other case, Q can not supply any links, and that 
body is split in two: :5-4 and : 19-18. This is a mistake of GOODT, 
who accepts W as a genuine T. If this were not the case, the arrow UB 
would establish a link between :5 and ;19, avoiding the mistake. GOODT 
could stand some improvement. 

The body : 22-23 was identified correctly. 
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ARCH 




FIGURE "A R C H" 



Ambiguous scene that could be correctly interpreted in 
several different manners. :7-17 was reported as a single 
body (see table 'RESULTS FOR ARCH'), and also :9. 

The body :5-A-19-18 was split in two: :5-4 and : 19-18, 
but not : 3-8-29-31, which was counted as one body. 
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Scene HARD ^^^ scene consists of objects of the same shape, namstly 
triangular prisms. All are correctly Identified, Including the long 
and twice occluded »3-21-22-23-24-28-29. : 1-2-33 was also found. 
LOCAL had to be used to join »15 with :16, and also jll with tl2. 

In an older version of the program, t7 was identified as a sin- 
gle body, and :6 as another, because they have no visible "useful" 
vertices to place links {Guzman PISA 68}. How SEE joins t6 and :7, 
because both are "QOODPALs". See "Operation of the Program; SMB"(page 

99). 

These scenes are sometiiaes obtained from a picture, so that 
they are the result of a perspective transfonvation. Some other 
scenes are drawn more or less in an orthogonal or isometric projection. 
SEE does not depend heavily in the type of projection^ there are only 
a few heuristics that use notions of parallelism. 
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HARD 




FIGURE 'HARD' 

All the bodies were correctly found. 
The most difficult was :6-7, since SMB 
had to Join both regions, which do 
not have "useful" visible vertices. 
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Scene L^ 

-~ The body :10-9 was reported isolated from : 13-2-3, 

due to insufficiency of links. See comments to figure R17, also. 

The algorithm that localizes matching T's could stand improvement. 

It sometimes produces "bad links" such as between :4 and :13, and 

between :6 and :3, because it found two T's that looked like they 

were matching (this mistake did not happen, actually, because vertex 

R is not a T, but a fork'.), EA and R in this case. The suggestion 

in page | <] 3 will lessen, but not suppress, these "mistakes". 
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Scene R4 ^^ table 'RESITLTS 70K R4' shows what happens when the 
tolerances are too large. Five bodies are found. Vertex B Is 
considered to be a "T" , and Inhibits the links suggested by the Arrows 
R and A. As a result, il gets cut off i7-9-5-10. 

The way :2 gets Isolated is as follows: T and AA claim to be 
matching T's, the link suggested by U is inhibited by Z (a Comer), 
and :2 gets disconnected from i3-4. 

The correct solution la obtained after reducing the values of 
COLTO and SINTO to 0.05 and 0.005 (see listings; COLTO decides if two 
lines are colinear, SINTO if they are parallel), respectively. The 
results appear also in 'RESULTS FOR R4' , and we can see now that only 
three bodies (the correct ones) are identified. 

Suggestion Lines like the one below should be | SUGGESTION | 
"straightened" either by SEE or (better) by the preprocessor; for 
example, B K L N and D G H in figure R17, See section 'On Noisy 
Input ' . 



Conservatism and Tolerance „ ^ j .. ^ t j ..1..1. 

•a>~aa^— •-•^^^'■■^^^^^ Hore strict tolerances do not make the 

program more conservative in all cases: the link In (a) fails to be 
placed if the program has too loose (large) tolerances, because A 
will be transformed Into a "T" (It will be considered to be a "T"), 
loastng the link; the link in (b) fails to be laid If the tolerances 
are too strict, because the T- joints will not be colinear. 





In (a) f links disappear if tolerances are 
too big; in (b) , If they are too small. 
In both cases, conservative behavior (cf. 
page l\7.) appears. 
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Either three «rfiv« bididi«V «£*' fo«Mi' tewiMRlN^ of 

certain peraaetere. Ihese bc«ms are "mAmf^ la &m imm* that 
the coordlnatiw of the vertlc^ d^art fttm ^wir *%daal" poaltlon 
by as iBttli.^ ewe iitiljtetar, or about 1 Z of tiwitotal slse of 
the image, |nlcfa la aboiS''^one dedaater. Ciia er^MHta sot large 
enough to iSfect lopg Il^iniTiNMt^it may a ub t tiat'l a M j i'^llB ^e the 
direction of abort i^aej^wftta . JS. 
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Scene MOMO ^j^^ ^^^ ^^^ t29-30-34-20-19 gets Identified as follows t 
:29 and }30 get two links, and t30 with :19 also, so we have the 
nucleus t29«*30-19. Two links (because of matching T's) Join :34 with 
t20, to form nucleus i34-20. Regions t30 and j34 receive a strong 
link, by heuristic (g) of table ' GLOBAL EVIUBHCE' , and :19 with :20 
by the same reason. That completes the body. 

The fork that is common to :12, 13 and l4 puts a link between 
:12 and :13, but it Is not enough to cause mls-recognltlon. A link 
is put by that same Fork between :13 and :14, as It should be, but 
the link between :12 and :14 Is inhibited by NOSABO. 

There is a program that finds regions of a scene belonging to 
the background, \4ien not indicated as such in the input. ?or MOMO, 
the results of this program appear in page l**' . 
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MOMO 




FIGURE 'MOMO' 
All bodies are correctly Identified. 
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Scene BRIDGE j^gg^^jj .^o gets a strong and a weak link with !4, and that 
is enough to Join them. The same is true for :7. 

The links of scene BRIDGE (see 'RESULTS FOR BRIDGE') are discussed 
and displayed in pages 95^-98 , figures 'LINKS-BRIDGE' (page 95 ). 
'NUCLEI- BRIDGE' (page 96 ), 'NEW-NUCLEI-BRIDGE' (page ST"), and 'FINAL- 
BRIDGE' (page98). 

Because RA and SA are matching T's, two wrong links are placed! 
one between s22 and i28, and the other between sZl and i29. This is 
not enough to cause an error, because we need two mistakes (two rein- 
forcing each other), two wrong strong links, to fool the program. But 
that could happen. 

It is interesting to note the way in which the long "horizontal 
table" 125-24-21-27-9-12 was put together. To this effect, see figures 
'LINKS-BRIDGE' and 'NUCLEI-BRIDGE'. 

Vertex JB produces only one link between :5 and :8. Vertex KB in-r 
hibits the link (through NOSABO) between i8 and i9, and the link between 
:5 and i9 gets inhibited by S, because it is a T (cf. NOSABO, page 82). 

The concave object j7-6-5-4-8-10-ll gets properly identified. 
We may say that, in general, the more "crooked" or complicated an object 
is, the easier will be for SEE to Isolate it, because there will be 
many vertices contributing with valuable links. 

Ho mistake was made by SEE on BRIDGE; its eight bodies were co- 
rrectly identified (see 'RESULTS FOR BRIDGE' , page la')- 

The background of 'BRIDGE' was also correctly isolated; see that 
in pageZ30, section 'On background discrimination by computer'. 
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BRIDGI 




FIGURE 'BRIDGE' 
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DISCUSSION 

We have desoribed a pn^iram that analyzes a three-di- 
mensional scene (presented in the form of a line draw- 
ing) and splits it into "objects" on the basis of pure 
form. If we consider a scene as a set of regions (sur- 
faces), then SEE partitions the set into appropria^ sub- 
sets, eadi subset foiming a tiiree-dimmsionat body or 
object. 

The performance of SEE shows to us that it it poatibk 
to teparale a scene into Vie iAjeela forming it, tDtthmit need- 
ing to know in detail theee objects; SEE does not need 
toknowthe 'definitioiM' ordesoriptionsof apyramid,or 
a pentagonal prism, in (Htler to isolate thdw objects in a 
scene containing them, even in the case where they are 
partially ocduded. 

The bade idea behind SEE is to make global use of in- 
formation collected locaQy at each vertex : 1^ informa- 
tion is aoigy and SEE has ways to combine in^y dif- 
ferent lands of unreliable evidence to make fairly re- 
liable global judgments. 

The essentials are : 

(1) Repres«itation as vertices (with coordinates), 
lines and regions 

(2) Types of vertices. 

(3) Concepts of links (strong and weak), nuclei and 
rules for forming them. 

The eumnt versioB of ^E is restricted to scotes pre- 
sented in qrmbolic tatm. 

Since SEE requires two strong evidences ito i<Mn two 
nuclei, it spears that its judgments will lie in tiie 
'safe' side. Hunk is, SEE will ahnost never join two i«- 
gions that belong to d^Eerestlwdies. Fnm the analysis 
of scenes shown above, its errors are almost ahray* of 
the same type: re^pons that should be joined an ItSt 
separated. We coukl say that SEE behaves "conserv- 
atively," especially in the presence of amlnguitieB. 

Diviaons of the evidence intojtwo ^rpes, strong and 
weak, results in a good ooo^romiae. The wedc evidrace 
is considered to favor linking Ute regions, but this evi- 
dence is used onty to reinforce evidence from more le* 
liable dues. Indeed, the weak links that give extra 
weight to nearly parallel lines are a concession to ob- 
ject-recognition, in the sense of letting the aoatyris qrs- 
tem e3q)loit the fact that rectangular objects are com- 
mon enough in the real world to warrant q>ecial atten- 
. tion. 

Most of the ideas in SEE will work on curves too. 
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CURVED 



OBJECTS 



How to extend SEE to work with objects possessing curved surfaces. 

Introduction and Summary ^^^^ ^^ ^^^ heuristics that establish links 
at each vertex are unconcerned If the edges are curved or straight; a 
few heuristics get affected: those that use the concepts of colllnea- 
rlty and parallelism. 

Thus, it Is necessary to redefine and broaden these concepts. 

1. A sll^t generalisation Is obtained if each segment is represented 

as having two slopes (Initial and final) . The funetltms PARALLEL and 

CQLIKEAR of SEE are already modified for this (cf. listings). 

^ SEE does not care if the line Joining two vertices 

' Is a straight or curved line. TTie information 

about the segment Ai-S that is relevant to SEE is: 

(a) There is a line betweeai vertex A and vertex B. 

(b) The coordinates of A jnd B . 

(c) The segment A-B separates region tl from :2. 

2. Attei^ts to take limited account of the shape of the segment carry 

us to 

(a) gently bent segments (definition) are those with bounded slope 
[Bounded curvature will lead to another definition] . 

A auasl-rectillnear object has faces, vertices «nd gently 
bent edges or segments; it is expected that SEE will work 
well for them. We should try some scenes. ISBGGESTIONl 





a, b: gently bent segments, c: non-gently bent 
segment. A gently bent seg^nt has a slope that 
at any point of the segment does not differ more 
than epsllon from the mean slope of the se^Mnt. 
All sieves fall in an Interval around the mean 
slope. Gently bent segments form quasl-rectillnear 
objects. 
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Quasl-rectillnear objects. It Is expected 
that SEE will work well for them. 



(•>) partition of a non-gently bent segment Into several gent ly 

J^fJit. Many of the bodies have vertices and curved edges, 
but the bodies are not quasl-rectlllnear (a piece of chewed 
gum, leaves of a tree) . By breaking the edges Into gently 
bent sub-segments, they become quasl-rectlllnear bodies. 
The breaks will occur In points where the curvature Is large. 
There has to be devised away to break a segment In a unique' 
manner. To avoid breaking a body Into two by the introduc- 
tion of these artificial vertices, we propose to introduce 
also artificial links between regions, to account for the 
artificial vertex, 
■w 

The non-gently bent segment ab 
gets broken into gently bent seg- 
ments ak, kl, Im, mb, by the 
artificial introduction of "new" 
vertices k, 1, m. 

Here, the introduction of 
additional vertices has to 
be accompanied by 'artifi- 
cial' or reinforcing links, 
to preserve the individua- 
lity of the body (of the 
owner of such vertices) . 





3. More con^lete consideration of the shape of the segments is obtai- 
ned as follows: 

(a) For parallelism, by requiring that two segments be parallel 
only if one is a translation of the other. Generally, this 
is a comparison that takes a time proportional to the length 
of the segment. Chain encoding {Freeman} {Conrad} is suggested. 
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(b) For coLlnearlty, by discovering properties or features that 
"carry through" or are common. Among these arej 

1. Mathematical "regularity" of the segments. Both segments 
are described by the same or similar polynomials, etc. 

2. Heuristic properties: there must exist properties which 
will select with high probability the "right" continua- 
tion. 

3. Outside of the set of geometric properties, we have 
color, texture, etc. 




'The same line dissappears at b and appears 
%t c, making b and c "matching Ts", but to 
discover this fact it is necessary to have a 
concept of "good continuation" or "good con- 
tour" . 



Alternatively, we may forget these properties here and include 
them into models of our curved objects, but then we are for- 
ced to make searchs in our scene like those made by OT or TD 
{my M.S. Thesis}. 





Fig. 'SUITCASE S' 

Heuristic properties of segments (yet to be 
determined) could select a "correct" match 
for endings a, b, ..., k,l. 
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4. Bodies with no edges and vertices are In pri nciple easily Identi- 
fied by SEE. See fig. 'FRUIT'. 




Figure 'FRUIT' 



The bodies have no curved edges, and no vertices. The entire 
surface is smooth; no sharp edges or pointy corners. Examples: 
an inflated balloon, a frankfurt, a face, a cloud. 

It is doubtful that we could do something here with SEE. We 
could try to postulate "artificial"^ vettiees, using steSeb perhaps, 
at the points where the 3-«fim curvature is large, and then postu- 
late lines between sach vertices. This looks bad. 

Or we could reason as follows: SiitCe these ofejects do not 
have vertices or edges, then the only vertices %pi»&ZUig XD 4lS 
scene must sep arate two bodies . They will be mainly T -joints, 
(cf also page 46) 

In principle, separation into bodies looks pronnising, but 
recognition (the answer to "what is the na.na« of Qtxs object?") 
seems difficult. Nevertheless, it is not clear^th^t with such a 
simple set of heuristics we could work successfully with objects 
as complicated as a human face, a blob of falling water, an 
amoeba, the surface of the sea (?). 



At some point, we have to know what we want ^^ ^.j^^ co»¥le*l-ty 
Increases, the concept of "body" depends less and less in geometrical 
properties (disposition of edges, vertices, ,,.) *nd aore and more 
on purpose (Is a skeleton an object? Or perhaps the femur bone alone? 
The answer varies with our intention ~ with the context) . 

Thus, models are necessary again. 
See also 'Do not use over-specialized aseiunptions. . .', page 252. 



186 



REQUIREMENTS 

FOR THE 
PREPROCESSOR 



APPENDIX TO SECTION ON CURVE OBJECTS 
This appendix may be omitted In a first reading. 

Rac^ulrements for the preprocessor ^^ preprocessor that feeds data 
to SEE has to find only: 

1. The lines of the scene. 

2. The vertices. 

3. The local slopes at each vertex. 

4. See also comments to figure R17. 

5. Illegal scenes (page 2.(7) should be detected by the preprocessor. 



How bad will curved objects be _ u, .. 
^^-•^^^—aai— »i^— ^^^— — ^-i^— •— > In objects 

where the curves edges are gently bent, SEE 

will work fairly well. The more an edge 

departs from Its rectilinear equivalent, 

the worse SEE will work; T- joints will be 

difficult to find, a FORK may transform 

Into a 'T', etc, (I am talking about the 

current SEE, described In the listings). 




Additional Information could be used 



So far, we are trying to Iden- 



tify objects on the basi) of form alone, 1. e., geometrical considera- 
tions. This is asking a machine to do more than a human being does. 
Ambiguous line drawings, such as ARCH, become Inamblguous when we 
introduce shading, lighting, texture, color, etc. All of these pro- 
perties could be used by SEE. In fact, consider how easy it would be 
to identify bodies if each one of them is of different color (and we 
could sense that fact). 

Psychological evidence ^^^i^^^^ ^f the algorithms used by human 
beings for shape continuation (page J8&) is relevant. We quote from 
Krech and Crutchfield {1958}t 
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Grouping by Good Form. Other things 
being equal, stimuli that fonn a good figure 
■will have a tendency to be grouped. This 
is a very general formulation intended to 
embrace a number of more specific variants 
of the theme, traditionally classified as fol- 
lows. 

1. Good continuation. The tendency for 
elements to go with others in such a way as 
to permit the continuation of a line, or a 
curve, or a movement, in the direction that 
has already been establbhed (see Fig. 37c). 

2. Symmetry. The favoring of that 
grouping which will lead to symmetrical 
or balanced whqles as against asymmetrical 
ones. 

3. Closure. The grouping of elements in 



such a way as to make for a more closed or 
more complete whole figure. 

4. Covmion fate. The favoring of the 
grouping of those elements that move or 
change in a common direction, as distin- 
guished from those having other directions 
of movement or change in the field. 

It seems plausible to consider that the 
percepts resulting from all of the above 
determinants would be such as to meet the 
criterion of a good figure, that is, one that 
tends to be more continuous, more sym- 
metrical, more closed, more unified. 

Now the reader will see that a difficulty 
with this general proposition regarding 
grouping centers on the crucial phrase 
"good figure." How can we Imow which 



o o e o o e o 



o e o e o o o 
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FIG. 37. Examples of grouping. In a, the dots 
are perceived in vertical columns, owing to 
tlicir greater spatial proximity in the vertical 
than in the horizontal direction. In b, with 
proximity equal, the rows are perceived as 
horizontal, owing to grouping by similarinr. In 
c, the principle of good continuation results in 



seeing the upper figure as made up of the two 
parts shown to the left below, even though 
logically it might just as well be composed of 
the two parts shown to the right below, or in- 
deed of any number of other combinations of 
two or more parts. (Adapted from Wertheimer, 
I9J3-) 
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BOX 21 
Hmr to Measyic "Goodmtaf 



Attneave has made an ii^niooa experi- 
' meiftil cttM^ on the problem of measurii^ 
the "ptodncsa" of a Sgatc The iirt>icct n 
given ■ ihe« of graph paper composed of 
4U0OO tiny a^iaics (so rows |^ 8d etifaiqins). 
Hii tMk i$ to guev whedier the color of 
cadi moceirive a^iwe is Uack, white, or 
gajr. The e ap a riw e n ter has m nvad what 
the eon ip taed %Dtt wfll look like <fig. «). 




Wiiboiit knowing what the conflicted 
6gatt will be, tN sobject starts by giMtst^ 
the annate in the kutm left oemer. When 
he ha* cocMcdy i^nt^ied the carior, he 
^lovts oa to goesi (he next sqoaie to the 
Mtt, He oonttaaeS:'riiis process to die imd 
of the (ow and dwtt start* on the left end 
of the next laiir^bdirt. In tUs manner he 
lucwiBwIy gBcsse* cadi of the 4ioao 
aqaaies. 

On die averse, Acmcavf^ sd>)ects made 
only If to 20 wrong gmm for the emiie 
^ore. H«w was dai pw ss ihiB i The answer 
is that the %Ere was deiibtaat^ destgmd 
ad Hm fannde^ of pmit fif die figure 
was sdBeknr to earide the s^b^ to mdce 
fafaly viM prediedaas tboat die remainder 
of the figure. IM* wa* accomf^dud by 
nid^ m the wMe iquates cam%aous 
with one MMdier, «ld simiiarly the Uack 
and the gngr aqoares. ;M«reover, the con- 



tours separating the white, black, and gray 
areas are simple and leguiar. Where the 
figure tapers, it taper* in a regular way. 
And it has ^nunttty; after exploring one 
side, it is easy to ptediM the odier aide. 
Thus, the snbjca havii^ diacoveted that the 
fiist few squares are white conrinues to guess 
whke, and be is eottttt nntit he hits the 
gray caatour at the »ath coImhi. After one 
or two errors, he then eowinges to guess 
grs^. On the next row abo*e, he tends to 
repeat die pattern ol the (bit. 

All tiMse factors of compactness, symme- 
try, good comiiiaMion, etc are aspects of 
what is implied by a "food ^nre." Thus an 
objecthre measure of die "geodncss" of a 
fipne i* (he eaae wldi which ^ subject 
can (Kcdiet its total form from minimal 
mfonntdan about a part. 

Odiar figpusKcao be aaaBady tested. For 
example, Sgmt >woidd pr0*e to be a lets 
"goodT lignte bc«M*e die aMmbcr of erntfs 
m gutaiing would be fangar. 

Aansav^ pardcolar mediod will not, of 
come, sffiy w *& khids of figures or all 
loads of p*teq«nd ut yiti a a t i o o s. But it 
doe* J e iuwMda t t dut diere ate ways in 
whidi "goodness" can be obj^vdy deter- 
imned. 




TO ao 






ooofigoratioii of tdundi jb 'lifter" than 
anodwr? 

To es^w frain dut dUBn^ty, we nee^ 
to Ymrt imUpadmt edxutk of what n A 
good figure. Some approach can be mad* 
to this; for instance, in the case of "sym- 
metry" there are objective rules we can 
apply to deteimine the relative tynwntxry 
of various figures. The same is true of sim- 
ple cases of "closure." (See Box it for a 
relevant experiment.) 



But we are ^ fsam htSag tUe to scatiB 
such criteria when ym-dtgi widi dbe h^ily 
con^lex ceniigaraeioiw of our npxmat par- 
a<pn>al cspcrioice. Pierr of di« diffic^)!^ 
stems from the fapt of individual differ- 
ences among pcrceivers. One man's mess 
may be another man's order. And diis may 
reflect the important role of learning and 
past experience in the genesis of "good 
figure." 
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ON OPTICAL ILLUSIONS 



.■..■_J \U-V-ilna\ ■ [ME, fr. MP. Ir. LL lO u bm , Unit, to 
L, MUoa o( mackioi, Ir. Ulww, pp. M Hb- 
dere to nock at, Ir. in- + htdfrt to play, >,^ ^ 

mock — mote H Lunritousl 1 « «M > . C 
>dMactioiiaf4>i<iMiii b (l)iike Milesc .''^ * ^v 

m actual ntura (2) t HALLIXIMAmM 1 I I'^vOi^nJ *« «••»'?'■■ 



M Men • my h io omm — nanw ia nw , —v^^~n_ i «_ _^Mbi aa amtr 

ipeciiw a :a1lMpWa HuipaicMliak- ' . ^^ ^ ■ «oi«ac « ™» "<"» 

binet or tulle inn. HKde of rilk aad oMd Mr 
veili, Irinmlnii, and drasaes lym lea imlu- 

atl— n,li|,dAi.BrT \ll-11-di>-,ixr.«\ aO 




Given the nature of SEE, we «111 restrict the meaning of 'optical 
Illusion' to Illusions formed by solids, that Is, ambiguities or 
inconsistencies «h«n we (or tiba ptogxam SEE) try to find 3-diffl bodies 
In a scene; thus, the tfiillier-Lyer illusion ("A" in the topmost figure) 
is not considered. 

I " I ii w According to this, «* may elementarily 
classify the "sceoa* that are unlikely to -oeeint!" (that is, those 
that are iu>t "st^sitiatl^'* or "normal") in ^kfatypM: 

"■ Possible but no "good" interpratati&n. 

>« Ambitious — severs], good InterpratMlons. 

»= Ia|M>ssible: without Interpretation. 

Like POLKBKICK {Guaoaa}, SEE is not •apaclfleally designed to 
handle optical illttSliHU. It was prlautily daslgnad to analyze "real 
world" scenes; hence, an input scene that produces an illusion (in 
a human) is not likely to occur as input to SEE, Nevertheless, in 
the same way that we may overtest a program for square roots by asking 
for the square root of 'APPLE' ,'VO'« we msy test SEE with some 
ambiguous scenes. Let us see what happens. 



POSSIBLE BUT ND "GOOD" INTERPRETATION „ va .. j ^ • , i 

-^— ^— ^^^■■■■-■™'"^^— ^^™"^i^"»^"" Some Objects do not 'make sense' 

because they violate rules that most objects obey. Nevertheless, it 



191 






is fom^n* *• tan "twdf 
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.^fc^^ 















Si}* 




Hid 







ACTUAL IMPOSSIBLE TRIANGLE was consimcted by the autlior and his colleagues. 
The only requirement is that it be viewed with one eye (or photographed) from exactly 
the right position. The top photograph shows that two arms do not actually meet. When 
viewed in a certain way {bottom), they seem to come together and the illusion is complete. 

CFrom Gregory) . 
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One of the strong rules used by humans Is that objects whose pic- 
tures show straight lines have Indeed straight edges; another strong 
rule is to assume th« comers to be like the comers of a cube (facAs 
meeting at right angles) ^ . Under these rules, the above triangle 
does not make sense and people will classify It as an "Impossible" 
object ( 'vARIAMT'wlll be an "Impossible" object; Penrose's Triangle 
will be "3 sticks forming ad liiq>osBlble configuration or scene; 
"mounted In a funny way"; can not be seen as representing a single 
object lying In space). For instance, Gregory {Scientific American} 
tries to explain that the triangle has a real 3-dlm object as origi- 
nator, by constrxicting a body consisting of three rectangular 
parallelepipeds ("bricks") joined at right angles, and then taking a 
picture from a special direction, so that the free ends a and b 
seem to touch: 



X^V 




k- 



Pig, 'VARIANT' 



These rules (faces meet at right angles; straight lines mean 
straight edges) are deeply ingrained into people, but nature does not 
need to follow them always. The Penrose Triangle can be obtained by 
photographing a 3-dlm triangle with curved edges and skewed comers, 
where each side touches the other two. 

SEE finds three objects in figure 'Penrose Triangle.' 
Other examples follow. 




Figure 'BLACK' 

People assume that faces meet at 
right angles, and this object 
violates that rule, making it 
"Impossible" or odd-looking. 
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It Is possible to construct object 'BLACK' with planar faces. See 
figure 'TEST OBJECTS' page •209. SEE finds one body In 'BLACK'. 

The object at right looks 
Impossible if we assume all 
faces to be flat. If face aeb 
Is curved, object is plausible 
R is its reflection on mirror 
M, and(^ a smoother version 
of R. (^ looks "normal"; by 
deforming <fl we could obtain R. 

Unlike humans, SEE does not 
hold these "very common rules" 
as Inviolable; SEE does not 
have any special problems with 
these "strange but true" 
objects. 

A misleading suggestion of 
sl^>eriorlty should not be concluded 
from these rare cases; in other 
situations SEE makes mistakes 
that a human being does not 
(see figure 'SPREAD'). 

Of course, SEE holds its own 

rules (for example, those of 

table 'Global Evidence') as Inviolable; hence, given a "rare enough 

scene" it will make mistakes (cf. assertion in page S"i , after the 

Theorem). This is a similarity of behavior, I think, between people 

and SEE — each one follows rather rigidly a small set of rules. 

(see also conclusion at end of section) , 
Besides, often humans will see the 'impossible' object as an 

object , doing SEE's job just as well. 
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Figure 
' STAIRCASE 




Figure 24. 

' Impossible 

Object.' This can be 

drawn, but it corresponds 

to no possible physical object. 

(From Penrose, L. S. and Penrose, 

R. (1958). Brit. J. Psychol., 49, 31.) 

(caption by Gregory) 



The "always descending staircase." {Gregory, In fFosS}} 
The caption Is wrong, this object could be constructed In real world. 
If some surfaces are curveiand/or the faces at the comers do not meet 
at right angles. Example of an object "possible but without 'good' 
Interpretation." See also Metatheorem on page 2>9 , Again, the "Impo- 
ssibility" or oddness of 'STAIRCASE' comes from assuming the rules 
'straight lines In the drawing correspond to straight edges In 3-dlm' 
and 'faces meet at right angles, like corners of a cube' Inviolable, 

AMBIGUOUS - TWO GOOD INTERPRETATIONS ^.^^ ^^^ „„„„„„ ♦.j,^^ ^„„ k» 

— ^— ii^^^^— — ^™^^^— ^""^^ These are scenes tnat can be 

Interpreted In several correct (non- paradoxical) manners, which are 
also "sensible" (as opposed to the Trivial Solution of page «(l ). 
For Instance, an scene like 




that can be Interpreted as 
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(A) 



or as 




(B) 



SEE will generally give one of the possible answers, although 
not necessarily the one preferred by humans. In this example, SEE 
chose ( B ) . 

The following scene, locally ambiguous, is correctly parsed by 
our program. 





Sometimes, the conservatism of SEE and Its partial 
Insufficiency to make very global judgements will leave a body 
unconnected; for Instance, the three faces of one cube below will 
be reported each one as a separate object, due to Insufficient 
links. 
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IMPOSSIBLE: WITHOUT INTERPRETATION 



Images that can not be product 



of photographing (projecting) a 3-dlm scene. These objects do not 
have physical existence. 



This scene Is without 
Interpretation, meaning 
no 3-dlm scene (with 3-dlm 
bodies) could have 
produced It. 




In figures like the above one, men are unaware of the extension 
of the background, and -«'^ makes sense even If B Is back- 
ground. SEE Is unable to make this mistake, and Its analysis of 
the scene will reflect the fact: the preprocessor will complain that 
one region, the background. Is neighbor of Itself. See comments to 
scene R3, page 113. 

Of course. In these cases there Is no answer to the question 
"which are the bodies in the scene?" Whatever answer SEE (or anybody 
else) gives. It is wrong. 

Nevertheless, according to our meta«theorem (page 33), there is 

an extremely easy way to discover and reject these Imposlble scenes: 

all o£ them are necessarily illegal scenes (q.v., page 217). And we know 

how to detect illBgal scenes. SEE (or its preprocessor, rather) already does that. 

SEE detects all impossible scenes, by refusing the data as an 
illegal scene. 
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A PROGRAM TO DISCOVER HDMAN OPTICAL ILLUSIONS 

Some scenes get classified by our metatheorem as 'possible but 
not "good" Interpretation', and likewise by SEE, which does not refuse 
to analyze any legal scene. 

Nevertheless, a person will stubbornly classify them as 'odd- 
looking' or 'not making sense' or 'liq>osslble' , even If we teach him 
the solution obtained by SEE (figures 'Penrose Triangle', 'Black', 
• Staircase ' , ' CONTSADICTORy ' ) . 





Figure 'CONTRADIGTOKX' 

One object Is found by SEE; (:1 :2 :3 :4) . 
As such (since It Is a legal scene), SEE 
classifies It as 'possible but not "good" 
Interpretation' . A person will classify 
It as "not making 3-dlm sense": a. human 
optical Illusion. Is It possible to 
reconcile these views? 

Of course, the metatheorem (page '2>9 ) insures that there Is at 
least one solution, so SEE's Interpretation Is "right" (It has chosen 
one correct answer, generally not the trivial solution given by the 
metatheorem), and the mortal Is wrong. Also, the theorem of page W 
Insures that any system (h\iman or computer) that uses too "local" 
rules (see fig. 'MACHINE') will make at least one mistake, no matter 
what rules he (or It) uses. 
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H*optlcal illu sions ™. . ^. .. ^ . ^ „„„ 

There is thus a disagreement between SEE and our 

fellow subject, because SEE has classified the scene as 'possible but 
no * good' interpretation' and our man has said 'contradictory as a three- 
dimensional scene'. Let us call these human optical Illusions (such 
as 'Contradictory', 'Staircase', etc.) by the name h-optical Illusions. 

What to do in these disagreements? Who is right? 

SEE is right Above comments seem to indicate that the electronic 
data-processor is correct. The human has used exceslvely "local" 
rules. That being the case, we can teach and train (if avoiding 
future errors is desirable) our subjects to "understand", raclonallze 
and make sense out of these h-optlcal illusions. Indeed, that is what 
Is tried in figures 'Black' , 'Penrose Triangle' , etc. Different 
people may show different degrees of (Hvoptlcal) illusion before 
training and after training (see Box). This training Is possible 
(see Box). 

In other words, if SEE is right, the computer scientist has 
nothing to do, it is all up to the psychologists and educators. 

Man is right We may hold the view that the human answer Is still 
preferable. Then, to our relief, man is right and SEE Is wrong. 
It is necessary (perhaps) to modify and correct S^, so as to emulate 
personal behavior. We suggest a way to do this. 

A program to discover h-optlcal illusions _ . 

It is possible to enable 

SEE to detect these h-optical illusions, so that it will classify the legal 

scenes into "possible" or "h-optical illusions." I DTTr>/^c!>nTnn I 

I SuGGKSTXON I 

As the problem of discriminating between background 
and objects (see section 'On background discrimination by Computer'), 
this is an interesting project from tfie "psychological" point of view 
but, as In the background case, it is not essential at the moment 
for our vision-robot work. 



* 

Strictly, there is a third possibility: both are wrong. 
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BOX 



There is generally a wealth of available information— though none entirely 
reliable— for settling the size and distance of external objects, with sufficient 
precision for normal use. As is well known, the visual system makes use of 
a host of 'depth cues', such as gradual loss of detailed texture with increasing 
distance, haziness due to the atmosphere and nearer objects partly hiding 
those more distant. These cues were discussed in the nineteenth century 
by the great von Helmholtz (1925), who fully realised their importance, and 
they have been the subject of many investigations since, especially by 
J. J. Gibson (1950). Whatever the richness of depth cues, however, the visual 
input is always ambiguous. Though the brain makes the best bet on the 
evidence — it may always be wrong. 

The kind of mistakes which occur when the bet is on the favourite though 
the favourite is not placed, is shown most dramatically by the demonstrations 
of Adelbert Ames (1946). The most impressive demonstration is given 
simply with a room which is non-rectangular, but so shaped that it gives the 
same retinal image as a rectangular room to an eye placed in a certain 
position. Now clearly this room, though queer shaped, must appear the 
same as a normal rectangular room, for it gives the same image to the eye. 
But consider what happens when objects are placed inside the Ames room. 
The further wall recedes at one side, so that an object or person standing in 
one corner is actually at a different distance than is a second object placed 
at the other far corner. These objects (or people) appear, however, to be 
at the same distance— and they are seen the wrong size. This is clear evidence 
that we assume rooms to be rectangular (because they usually are) and we 
interpret the size of objects according to their distance as given by this 
assumption. When the assumption is wrong we see wrongly. What Ames 
did was to rig the odds, and then we make the wrong decision on size and 
distance. A child may appear larger than a man. We may know this is 
absurd and yet continue to see a bizarre world. The retinal image is all 
right, but the odds have produced the wrong internal file cards and then the 
human seeing machine is upset, and gives a wrong answer. 

It is interesting that the Ames room is seen correctly by peoples, such as 
the Zulus, brought up in a 'circular culture' of beehive huts where there are 
few reliable perspective features, such as rectangular corners and parallel 
lines, in their visual environment. To the Zulus, the odds are not rigged by 
the Ames room— to them this is not misleading perspective. They are not 
subject to this illusion, but accept the room as the shape it is, and see the 
objects in it correctly in distance and size. This is a matter of very real 
importance. It shows that when we are transferred to an alien or bizarre 
environment, where our filing cards are inappropriate, we interpret the 
images in the eyes according to principles found reliable in the previous, 
familiar world — but now they may systematically mislead and then percep- 
tion goes wrong. Space travellers beware! {Gregory, In {Collins 

and Michie}} 
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A possible way to attack the problem i* 

(1) To identify each link with whoever proposed it. 

(2) To set up systems of simultaneous "symbolic" equations. 

(3) To solve them by « limination. 
We elaborate: 

(1) Mark each link with the name of the heuristic that produces it. 
After obtaining the 'maximal* nuclei by GLOBAL and LOCAL, sev^ 
ral links are left (for example, three in fig. 'FINAL-BRIDGE') 
and Ignored by the current SEE. Instead, one could see what 
kind of links they are, and one has in this way more informa- 
tion about the type of contradictions in the scene. 

(2) Introduce a 'conditional' link: regions :1 and :2 belong to 
the same body if region :3 does not. An OR link is now possi- 
ble by use of the conditioniLL, since aa^b -S- bV-»a. 

(2.3) Introduce a 'NOT' link: :3 »* :5, regions :3 and :5 do not 
belong to the same body. 

(2.6) As in ordinary algebraic equations, a system of n simulta- 
neous equations means that all of them must be satisfied; 
the "AND" of all must be true. Thus, AND is implicit in our 
notation. So far, we have OR, AND, NOT, IMPLIES (conditional): 
we have more than necessary. 

At the end, we have a system of simultaneous equations 
like these, where :1 = :2 means both belong to same body; this 
is an equivalence relation so I use the ■ sign: 

:1 - :2 OR :3 » :5 

:3 jt : 2 -^ :1 - :4 ^ ^ ^ 



/ 
We now procede to "solve" these equations. Three things could happen: 

Exactly one solution is found. This is the normal case, and 

that solution tells what the bodies are. Familiar, "claar", possible 

scenes will fall in this case. 

== More than one solution is found consistent with our equations. 
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All are reported. This is the case "Ambiguous ~ several good 
interpretations . " 
== No solution is found. This is a genuine fc<f>tical illusion, 

corresponding to a contradiction in the equations. For instance, in 
fig. 'COOTRADICTORY', equations set by the T- joints between i2 and 
;3 would be inconsistent with those set by the Arrows and Forks. 

How to solve the eq uations (E) ^^y ^^le solution to (E) we mean a division of 
the scene (:1, :2 :n) by means of a partition of the form 

{:1 = :5 - :7 - :6), 

(:3 - :2), 

(:4) 
which is consistent with (E) . 

In the current SEE, 

(a) The equations are only equalities: :1 = :2. 

Also, equations of the type :1 j^ :2 are taken into 
account by inhibitory mechanisms, such as NOSABO. 
No conditional links exist. 

(b) Since all equations are of the type :2 - :3, the solu- 

tion is obtained by applying transitivity, that is, 

1 ■ 2 parentheses 

T = -i '^ (1 = 2-3) indicate nuclei. 



o ^ 



V- 



Except that we require two antecedents for application 

of transitivity (two strong links): 

1-2 

=5^ 1 = 3 ^ (1-2-3) 
1-3 

2=3 
2 = 3 



^-<^-© 
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An exhaustive search (which successively tests each possible parti- 
tion) of the solution to (E) is impractical except in very small 
scenes, and heuristic methods are needed. 

I suggest to start from the equalities such as 1=2 

2 = 3 
and to form nuclei with the current SEE, except that at each step 
we check to see if our current nuclei satisfy all of (E) ; for 
disjunctive equations such as " 4 = 5 OR 6 3^ 7 OR 4 = 6" 
we try each branch of the OR in turn, rejecting those who conduce to 
no solution (this may be pretty combinatorial, too). 

Perhaps it is possible to use more Logic here — some sort of 
theorem proving. 

Conclusions and conjectures _. ^ ,, . . 

' The similarities between SEE and people 

(see also 'Human perception vs. computer perception, page254) stem 

from the fact that, like SEE, people seem to use only a small number 

of rules (although not necessarily those used by SEE) , which work in 

almost all cases, but when these rules conduct to an ambiguity or 

Inconsistency ("conflicts"), there Is reticence to abandon them, and 

mistakes or impossibilities are produced. 

It is possible that, like SEE, people use primarily local clues, 
and with leas frequency more global information to disambiguate 
Interpretations. I think that, in the presence of objects (In 2-dlm 
line drawings, such as 'MOMO', for Instance) not seen before, humans 
follow general rules not unlike those used by SEE to distinguish 
or decompose a scene into bodies. Rules that apply to all polyhedra 
have to be invoked, since In presence of previously unseen objects, 
humans can not use a model of the object. 

The more familiar an object Is (or if we have reason to suspect it 
or expect It) , the faster we abandon the general rules and propose its 
model as a possible explanatinn of part of an scene; we then jump to 
a model matching routine (a la DT {MAC TR 37)) that tries to fit the 
model to part of the scene (to a semi-isolated body); general rules 
a la SEE prevent us from overflowing with our model Into other bodies , 
and help us to deal with partially occluded bodies. 
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ON NOISY INPUT 

The performance of our programs Is analyzed when the data has 
imperfections consisting of (1) misplaced vertices, (2) missing 
edges, (3) spurious extra lines, (4> missing faces, (5) two vertices 
merged . 

The section 'Analysis of Many Scenes' contains results of SEE 
when applied to Imperfect scenes. 



Summa 



3L It is easy to predict the operation of SEE when the two- 



dimensional data supplied is clean , in the sense of being an accurate 
representation of the three-dimensional scene. 

In practice, of course, errors will occur in the data and it be- 
comes important to know how sensitive our program is to them. 

SEE has some serendipity. Many of the Inqterfectlons in the 
data do not cause mistakes In the linking procedure, or the link 
misplacements are not enough to cause erroneous identification. 
But mistakes are made. 

Here is how different types of ln^erfections are handled: 

""The assignment of types to vertices is highly insensitive to errors 
in the position of each vertex, except T'S that become Forks of 
Arrows. Two cures to the exceptions were found, only the first 
of which is implemented: 

(1) Allow tolerances in concepts of parallelism and colinearity. 

(2) Allow a long but slightly twisted rectilinear segment to be 
"straightened", as indicated in comments on scene R17. 

s= Missing edges are subdivided in three classes (discussed below); 
two of them produce recoverable or detectable errors (hence, 
susceptible of correction or prevention). It will be difficult to 
detect if a segment of the third class is missing; these will pro- 
duce recognition mistakes. 

ss Additional lines, like the ones caused by edges of shadows, are not 
easily detected as spurious or superfluous. Their presence mainly 
produces a diminution in the n\unber of useful links, thus some- 
cimes causing too conservative behavior -- i.e., proposition of too 
many bodies. 

as Whole faces may be missing. Ordinarily (see scenes LZ, Li9T). 
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the remaining part of the body gets correctly Identified. 

OBTAINING THE DATA 

The scenes analyzed by our program in this thesis were obtained 
by one of two methods t 

By free drawing ^ ^^^^ drawing representing three-dimensional objects 
was made; the coordinates of each vertex were accurately measured (or 
coa^puted) and the information was put in the 'Input Format" ^o™ 
previously described- Also the regions belonging to the background 
were indicated as such. 

These scenes have mnemonic names such as TSIAL, BRUXZ, etc. 

What kind of orolection did vou use ? Were theae ^»ome^rie drawings? 
Since no assu^tion is made on the rectilinear objects being drawn, 
the drawings are not isometric, or perspective, or ... projections. 
They could be any of them. It is not assumed that "we are dealing 
with prisms, with faces of a body meeting at ri^t angles (like the 
comers of a cube) ,"*'wtth convex objects. Neither the drawings nor 
the program make any assuiq;»tion of this type. If the reader wishes 
to adopt the assua^tion (specified above in quotation marks, theji the 
drawings will correspond to orthogonal projections of ttiree -dimensional 
scenes. 

Ho support hypothesis is needed: if necessary, the objects could 
be floating in a transparent fluid having their same density. 

By cwistruction ^^bitrary but not too complicated objects were cut 
from pine wood, with flat surfaces, and painted black. Their edges 
were painted white. By placing them on a black table (see first few 
pictures of this thesis) in different positions and combinations, 
three-dimensional scenes were created (see figure 'TEST OBJECTS'). 
Pictures were taken with high contrast film sUghUy under-exposed 
so as to render black everything but the lines, mffuse iUumination 
eliminated shadows LCreat help was received in the pictorial task 
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from Messrs. William H. Henneman, Sevendra D. Mehta and David Waltz, 

and Is here aoknowledged] . The photographs were taken with a depression 

o o 
angle from 45 to 90 (that Is^ looking down), 50 mn focal length 

lens, 35 mm camera (standard equipment). 

The size of the prints is approx. 8r by 11 inches (21.5 by 28 cm). 
If some lines were not clear, they were retouched with white ink. 
If SCTne lines were missing , thev were MOT added . 

The pictures have names like L2 or R3, a letter and a digit. 
Most of them are stereogra^hic pairs, taken with both cameras having 
parallel optical axes, and the sensitive film on the aame plane. 
SEE only analyzes one scene at the time, so the left picture is not 
consulted when SEE analyzes the right picture, and vlceversa. 

A transparent millimetrlc mesh is laid on top of the prints, 
and the coordinates are read by ctfc and put by hauni. in the 'Input 
Format' form. The thickness of each line is about 1 mm (see figure 
'TEST OBJECTS'); typically, the size of a scene Is 10 or 15 emt a 
mlnimtffli error of :t^ 1 per cent in the coordinates of a vertex Is al- 
ready present. The slopes and directlona of short segments suffer, 
naturally, much greater errors. Also, if two vertices are too close 
together (about two milllneters) they are merged and codified as one. 
We are simulating the kind of mistakes that are likely to occur. 

Also, some bias is Introduced, no doubt)^ by the human operators. 
[By reading the coordinates in most of the scenes, isMenae help waa 
given by Miss Cornelia A. Sullivan and Mr. Bevendra D. Mehta; the 
author acknowledges it.] 

Irrespective of the generation method, the scenes that appear in 
this thesis were drawn in their final form by the PI3P-6 computer 
through a Calcomp plotter, and then inked and finished by hand . 

Thus, it Is possible to perceive in many of them th« imperfections 

of the data that SEE had to analyze* 
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MISFUCED VERTICES 

The coordinates of a vertex may contain a small error or 'noise', 
How does this affect the type of a vertex? Does the type change? 



L. 



70SK. 



ARB0W 



z 






Y 



^ 



Not affected 



Not affected 



Not affected 



K. 



X. 



PEAK 



MDLTI. 



-7 






Transforms into MULTI. 



Transforms into MITUCI. 



Transforms into ARBOW 



Transforms into KIRR. 






Not affected. 



Not affected. 



Many types are unaffected. Type K vertices transform into 
MDUTI, but since K's are seldom used by SEE, this is no big loss. 

X's transform into HUnEIs, and we lose two links here, which 
makes SEE to behave more conservatively. Also GOODT gets affected 
(though not much). 

The serious change are the T's that get transformed into ARROWS 
or TORKs, when these T's are matching T's. Because they are used 
for linking otherwise disconnected pieces of a body, their loss 
generally implies the partition of a body into two. See figure 
'OISCOMNECTES'. 
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(A) 





Figure 'DISCONNECTED' 



The T's under discussion are marked by 
small circles ( • ). In (a), the mis- 
classification of these T's into Arrows 
or Forks does not break the occluded 
body, who retains its unity thanks to 
il. In (b) , the same mis-classification 
does break the occluded body, reporting 
two objects instead of one, a possible 
but less desirable answer. If the T's 
are not matching T's, as in (c) , their 
mis-classification does not matter. 



The loss of matching T's makes the program to be more conserva- 
tive in some cases. 



In some 
sense (see 'Desirability 
Criterion') this is tolera. 
ble. 

What other perils does 
the mlsclassif ication of 
the T's bring? We should 
worry if, due to errors cau- 
sed by T's, the occluded 
body joins the occluding 
one. 



DESIRABILITY CRITERION. 

(1) We would like a SEE that never makes 
mistakes. SiMcethis is not possible, 
then 

(2) We would like it to make mistakes of 
only one kind, either join; two 
bodies that should be left separat^ed 
(intrepid, cavalier behavior), or 
leave unattached two nuclei that 
should be reported as a single ob- 
ject (conservative behavior). 

(3) Among the two, we prefer a conserva- 
tive SEE, because its errors will 
be easier to correct (cf. Stereo 
Perception) . 




The T's should not originate 
the reporting of il-2-3 as 
part of one body 



Each T, when perturbed, will go to one of these states: (N) normal, 
nnncT-t-n^Ko^. CT ^ "i^ff" 17 ^„,„_j_ t7 e. _/^ becoming 



^: 



a FORK, or (R) "right", when Z moves away 



212 



il, S _/e^ 



from El J /g^ becoming an Arrow. 

For three T's of an occluded body, 3 =27 states are possible. 
They are shown In next page, in table 'THREE Ts', 

How many of these 27 states will produce 

mis- links joining 1 with 3 or 2 with 3 

or 1 with 4 or 2 with 4 (none of the four 

regions is necessarily background) ? 

None. 

The reason is that (see description of NOSABO) a T or an Arrow 
or an L inhibit the link shown below. 




/ 



so that (a) An arrow in position (I) [or (III)] suggests linking 1 
with 4. This link is inhibited by the L at IV [or VI]. 
Example: Figure R L L in Table 'THREE Ts'. (f*5' ■*"'^- 

(b) A Fork in position (I) [or (III)] suggests 

(i) linking 1 with 3. Inhibited because of the T or 

arrow in vertex II. 
(ii) linking 1 with 4. Inhibited because of the L in IV. 
(iii) linking 4 with 3. Depends on outside considerations. 

Discussed below. 
Example ; L R L. 

(c) An Arrow in position (II) suggests linking 1 with 2. 
Inhibited or allowed according to vertex V. Example: RRL. 

(d) A Fork in position (II) suggests 

(i) linking 1 with 3. Link inhibited by the T or arrow 

of I. 
(ii) linking 2 with 3. Inhibited by the T or arrow in III. 
(iii) linking 1 with 2. Inhibited or allowed according to 

vertex V. 
Example j R L N. 
Thus, no link is possible, even under these "noisy" circumstances, 
between 1 and 3 or 2 and 3 or 1 and 4 or 2 with 4. That is, 
the 27 cases of table 'THREE Ts' are treated correctly. 
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A possibility of bad linking exists between 4 and 3 In this 
case. If two T's convert into forks and "help each other": 




Two links originate 
the joining of 4 
and 3. 



Rather than get involved in this sub-problem, we will point 
out two solutions to the misplaced vertices: (1) by allowing some 
tolerance in 'parallel' and 'collinear' ; (2) by 'straightening out' 
crooked or twisted segments. We explain. 

Equal within epsilon (definition) a is equal within epsilon to b, 
written a«b, iff ja-bj <l€|. Generally, € > 0. 

Tolerances in collinearity and parallelism ^^^ ^^^^^ ^^^ parallel if 
the sine of the angle formed by them is smaller than SINTO. (siM — O) 
Currently, SINTO = 0.15 ^ ----^^ >e 

Lines ab and be are colinear if 
length ab + length be ~-. length ac. Currently, COLTO =0.05 

We have implemented these definitions. Better definitions exist. 
These definitions allow most small inaccuracies in the coordinates 
of vertices to pass unnoticed. Although they are giving reasonable 
service, they are only temporary, since by relaxing too much the 
criterion for parallelism and collinearity, strange things could 

happen (fig. 'CROSSED'). 

il 



<L 




If 

Fig. 'CROSSED' 

A too lenient definition of parallel 
and collinear could give the follo- 
wing matching T'si a to d, b to f, 
c to e. 
See also on section 'Analysis of many scenes' comments to L9 andR9T. 
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Straightening twisted segments ™ , ^. . 

1^^— — i^^^— i^— i— -i^^^.^^^— The definitive cure Is simple: 

reassign the slope of be tc be that of ad, if be is small, ad large 
a. 

IT. 

and the angles at b and c are close to 180 . See also comments to 
figure R17, This has not been implemented. In this way, all cases of 
table 'THREE Ts' will be solved. See also comments to scene R4. 

Probably the preprocessor will automatically take care of this 
rectification, since it may prefer to give a long segment ad instead 
of three almost collinear shorter segments ab, be, cd. 

Since the straightening of a segment replaces some known vertices 
(which we suppose inaccurate) by other idealized vertices, we may be 
introducing uncertainty, in the form of non verified hypotheses, to our 
data. The object in the scene could really be "crooked" or twisted. 





Fig. 'TWISTED' 

The object to the left is really bent as shown. 
If we idealize it as in the right, we are fal8i_ 
fying the information about it. 

By replacing it by an idealized version, we may be creating 
problems for its identification, when we want to assign a name to it. 
But notice that the 'unbent' version or idealization is handier for 
SEE. 

If the information is very bad „. . , , . 

Throw It away and read the scene 

again. A simile indicates that the issue becomes one of allocation 

of resources: if you receive a written message containiiig a few 

wrong characters and missing words, you may use your brains and time 
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to deduce the omitted portions (by employing the redundancy, for in- 
stance). If the dispatch is very garbled, you nnight as well request 
a new one. 

Summary ^.^ ^^ known how to handle small inaccuracies in the position 
of the vertices. 

HISSING EDGES 

From time to time, an edge will fail to show up in the scene, 
and the questions are ( 1) how much harm will be produced, and (2) 
how can we detect and correct the anomaly. An example appears in 
page 141. 

Illegal Scenes ^^^^^ ^hat end abruptly produce illegal Inputs, 
suggesting that segments are missing. 




^ ^ 



rig. 'ILLEGAL' 



CW 



In (a), a vertex has one edge. 

In (b) , the network can be separated by erasing 

just one edge. 

Both are illegal scenes, indicating missing or 

extra lines. 

Also (Figure 'ILLEGAL', (b)) a region can not be a neighbor of 
itself — another Irregularity that points to deficient data. Cf. 
comments to scene R3. Cf^* "3^' 

These constraints can be nicely exploited by a preprocessor. 

Line proposer and line verifier ^ ^^^^ proposer is a program that 
suggests places where a line can be missing; a line verifier is es- 
sentially a precise line finder that searches a line in only a small 
portion of the scene, as told by the line proposer. 
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In the body of this section we will develop several heuristics for 
use in a line proposer. The verifier is not discussed. 

Blum's line proposer . , . , , 

. An algorlthot has been designed by Manuel Blum 

{1968}, that will detect many places where lines are possibly missing. 

It suspects concave regions. An angle bigger than 180 originates a 

search for the omittedline in directions parallel to the neighbor 




Figure 'BLUM' 

Region i2 is suspected to contain undetected lines, 
because it is concave. Vertex v is chosen becau- 
se its internal angle is bigger than 180 degrees. 
From it, Blum's proposer will suggest to the line 
verifier to look for lines in directions 7A' and 
VB' (broken lines), parallel to the neighbor edges 
A and B. It also searches (dotted lines) along 
the continuation to lines C and D. 

edges (fig. 'BLUM'). It also originates searches along its own 
edges. In other conditions, a vertical line is searched. 

No harm is done by a bad proposer. Only some time is wasted. 

Internal edges , . ^ , 

■i If a missing line -".s totally internal to a body, and 

is not detected by the line proposer, its absence will at most cause 

conservative Vehavlor in SEE. In some cases their absence does not 

confuse SEE (figure 'MISSING'). 

The majority of internal edges cause concave regl^ons to appear 

(fig. 'BLUM'). They will be detected by a line proposer. 
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Fig. 'M 1 S S I » G' 



Cases where the disappearance of an Internal 
line (dotted) does not separate the body. 

In (a), the object separates Into two. 
This case Is recognized by Blum' a heuristics. 
Else, SEE could check for this configuration 
as a special case. 



External edges ^^^^^ ^l^g^ separate two bodies are called external. 
If undetected, their disappearance will cause 'Intrepid' errors by 
SEE, which are undesirable (see 'Desirability criterion' in page 212), 
Two cases result: (1) Only part of the edge disappears; there is possi- 
bility of correction. (2) The whole edge is both external and missing 
(and the scene is still 'legal'): a mistake will occur, See figure 
'External Edges', 

It can 



Case (1) Only part of an external edge disappears, 
detected because 

(a) a concave region Is generated, and 

(b) the region has Internal angles bl£ 
ger than 180 where a line "goes 
through" I ab Is collnear with cd. 



be 



W 
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Figure 'EXTERNAL EDGES' 



A segment separating two bodies may disappear. 
(1) If that segment is part of a larger segment, 

^oN^?^P°^^i''J^ ^° ^^^^^ ^^'^ correct the anomaly. 
U; x± a vftiole external edge is missing, its 
absence remains undetected, inducing a mistake 
in SEE. In (i) an external edge disappears, and 
creates an illegal figure. 

Case (2) The complete edge is missing. Then (b) of case 1 fails, 
and detection is difficult. 



SPURIOUS EXTRA LINES 



They are lines that "should not be there", such as those 
caused by edges of shadows. 




Fig. 'LIGHT AND SHADOW 
Each body becomes two; each one is recognized 
independently by SEE. Four bodies are found 
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Shadows of rectilinear objects travel in planes that (in theory) 
part an object in two (or more) : the illuminated part, and the dark 
one. Each is a separate object by itself, according to our definition 
(see 'Several definitions of a body'), since they have plane boundaries. 
SEE should recognize them. 

In practice, we have not tried our program with scenes having 
lines produced by shadows. A conservative behavior, like In figure 
'LXQOi; USD SHAIX>W', Is esqiected. 

Some shadows gradtially diffuse; multiple lights cause multiple 
shadows. These problems may have to be solved by assuming or compu- 
ting the direction or position of the llgjit sources. 



MERGED VEKCICES 

Two vertices fused In one will produce diminution In the num- 
ber of useful links they report, since the resulting vertex will 
be of type MULTI. Thus, conservative behavior Is expected from SEE 
In these cases (see Fig. L19, L17T, RI7, L4, etc. The program does 
well In them, when not too many coincidences are present). 

I 



SUG(SBSTIOH 



It Is possible to analyze the vertices of tjrpe 
MDUTI and try to decon^ose them In 8liq>ler types (coo^are figure 
BI9 with WRIST*). Read coinaents to R19 and L19. 

CONCUJSION 

On scenes obtained from "real world" data. Inaccuracies are 
expected, and It Is required of SEE to work well despite them. 
Currently, the behavior of the program In these cases is not 
discouraging, but Is not extremely satisfactory, either. The 
additional work needed depends heavily on obtaining genuine 
test data, instead of the faked data used In the experiments 
described. 
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BACKGROUND DISCRIMINATION BY COMPUTER 

A program determines the regions that belong to the background 
of a given scene; that Is, the regions that are i»t members of any 
of the bodies. Exaiq>les are given. 

Need 

" The program SEE requires to know which regions of the scene 

belong to the background (of. 'SEE, a program that finds bodies In 

a scene'). At present, this Information Is supplied by the user, 

as described In sectlois 'Internal format' (page (,<. ) and 'Input 

Format' (page (,i ) of a scene. 

In the current vision experiments, It Is not difficult to 
determine the regions that form the background, since they are always 
black and homogeneous (see first few pictures In this thesis). But 
In more realistic scenes, there will be a great demand for a background 
finding program. 



Therefore, it is interesting to try to 
develop a program to separate the "ground" 
in the back from the objects In the 
"foreground", having a limited Information 
consisting of the scene as described in 
section 'Internal Format', namely, vertices 
and edges. 

That is, we will use In this task only 
"geometric" properties. 



Such program has been written, and works automatically under 
the command of PREPARA, the fiinctlon that converts a scene from Its 
'Input Format' to Its 'Internal Format'. When the regions forming 
the background are not supplied, PREPARA activates our program, 
named BACKGROUND, and these regions are searched for; otherwise, 
SEE is supplied with the background regions as declared in 'Input 
Format ' . 
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Example. Scene 'HARD'. The results obtained are 



(SUSPICIOUS aRc NIL) 

THE BACKG'^OOi^lD Or HAr^D 15 
(S34 i36 s35) 
(:34 *36 S35) 




Three regions are found to be part of the backgrounds :34, i36, 
and :35. That Is correct. 

We now proceed to describe the subroutines that make such 
Identification possible. 

Suspicious j.^ ^ ^^^^^ pggg ^ „e collect the regions that "may be" 
background, and call them "suspicious regions". Regions that are 
not suspicious are LIMPIO (clean). 

Ideally, if a region :R contains L's, FORKs, ARBDWs or T's in 
the position below, it is not a part of the background. 



•It 



(1) 





:R 




CII) (iTt) 

FIGURE 'BACaCGROHND' 



(IV) 



In an idealised situation, sR can not be part of the 
background: it is clean , or free of suspiciousness. 
iR will be called 'LIMPIO' (clean). 
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(I) means that the background [almost] never is the internal 

part of an 'L' (the region containing the angle snaller than 
180 degrees) . 

(II) means that the background does not contain FORKs. 

(III) means that the background Is not in the "inside" of an ARROW 
(the background is not a 'proper'arrow'). 

(IV) means that the background can not be the flat region of a 'T'; 
this in turn means that a body can not disapipear under the back 
ground and then reappear at some other point t 




:3 is not the background. 

We reinterprete rules (I)-(I7) as follows: 

(I) A region "inside" an L is LIMPIO (clean) . 

(II) A region containing a fork is LIMPIO. 

(III) A region "inside" an arrow is LIMPIO. 

(IV) A region "on the flat side" of a T is LIMPIO. 

Clean Vertex (definition). A vertex is clean with respect to a re- 
gion if it indicates, through rules I-IV, that such region is LIMPIO. 
For instance, K is clean for :1 and for i2, 
since (III) indicates that :1 and :2 are LIM- :f 
PIO. K is not clean for i3. 

These heuristics are not 100 per cent infallible { also, in a 
moderately complicated scene, coincidences of vertices are bound to 
occur, originating violations to I-IV. For instance, in figure CORN 
(page 150), vertex UU is a Fork belonging to the background, in con- 
tradiction with (II). 
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For completeness, we present a violation to each one of rules 1-IVi 

:l 



(I) 




(III) 

FIGURE 'VIOLATIONS' 

:1 is the background. In all four cases, 
vertex V violates rule specified at the 
bottom of figure. They are rare cases. 
The situation indicates that rules I-IV 
provide noisy information, which has to 
be dealt with carefully. That is what is done. 

The vertices of each region are analyzed under rules (I)-(IV). 
To allow for coincidences of vertices and rare cases (like those in 
figure 'VIOLATIONS'), it is permitted for a suspicious region to 
have a small number of clean vertices. 

The number of clean vertices is compared with a quantity that 
is a small fraction of L (the number of vertices on the boundary) ; 
currently, that fraction is L/9. 

== If the number of clean vertices, that is, vertices satisfying 
I-IV is bigger than L/9, we call that region LIMPIO ("clean"). 
In addition, (a) If L is large (bigger than 25, currently), 
that region is BIGFACE, such as t21 of 
scene L19 (page 144); 
(b) Otherwise, it is only LIMPIO (normal case). 

If it is not bigger than L/9, then it is SUSPICIOUS. Also, 

(a) If L is large (bigger than 25) , the region 
is BACKGROUND, 
(b) Otherwise is only SUSPICIOUS (normal case) . 
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That Is, a region LIHFIO has to have at least 
1 + [one vertex of each nine] 

"clean" vertices. 

Example. Region :3 has four 'clean' 

vertices (four vertices Indicate that t3 

Is LIMPIO) — It c9n not be SUSPICIOUS. 





Figure 'BQDILIBRIUH' 

(This scene Is correctly analysed by SEE) 
All the three vertices of tl are not clean; 

tl will becoste Suspicious (a candidate for 
background). Five of the seven vertices of 

;2 are clean, so t2 is LIMPIO. Note that 
vertex C Is clean for i2 and not clean 
for il. 

For example, when we apply the function SUSPICIOUS (see listings) 
to every region of scene SPREAD, the susplclotis regions turn out to bet 
Suspicious only: i35 :18 i34 j2 i3 :12 til i33 t37 

i47 i48 t46. 
Background i i48. 

Sunmiary ^^ analysis of Its vertices, each region Is either LIMPIO or 
SUSPICIOUS. The suspicious regions with more than 25 vertices are 
classified right away as BACKCBOUIO)! a suspicious region with many 
edges Is probably background. 

The selection Is done entirely using "local" properties! a 
region Is classified according to Information supplied exclusively 
by Its own vertices. 
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FIODKE 'SPREAD' 
SUSPICIOUS or BACKSBOraP. 



Our g<Ml i» to ikeids f^iiA o£ the suspl- 



More global Indications 

clous regions are ua^IO , and nidjefe mclir axttll 

■— Since two iwckgrouDd regions can nbt'ba c<»litpiuiiii* ( the back- 
ground can not be nelg^l^or oi''i^f(fi'ilk§iitMMi'tegia^ 
are contiguous with the lucl^traaad' aire' el^^iaaifi^ in the 
LXHPIO status. 

In our exanple, tAB Is baekgrouttii and theri^'fote Its sus- 
picious nelgEhbor :I8 gets cleaM^'iiid^iMB^iiMstlii^IO. 

-« Links are established through th« iMtshii^.,T's> 1^ cai;L them 
b-llnks. , 

Ideally, a suspicious reglon^lidud tp a I^BPPIO, nsglpn 
gets cleaned , a suspicious reglonibiiBdiad to the baak^jroiind gets 
converted to background too. 
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Idealizing, suspicious region :1 
becomes LIHPIO, and suspicious 
region t2 becomes background. 
A more coqtllcated procedure Is 
actually used. 



In practice, we allow for small errors as follows: 

For each suspicious region, we notice if It is b* linked 

to background (BA), suspicious (SO), or Limplo (LI). 

BA =" == If It Istllnked to background regions, we 

change It to Background, except if It has a 
background as neighbor, in which case we do 
nothing and continue. 

() SO LI If notk-llnked to background, but Irl Inked both 
to Suspicious and Linqtio regions, 

(1) If LI < SO, continue, do nothing. 

(2) If LI :^ SO, classify this region as 

limplo (LI is the number 
of LIMPIO regions b> linked 
to the current region un- 
der consideration) . 

SO () If blinked only to suspicious, continue, do 
nothing. 

LI If Idlnked only to Lia^io, change it to Liiqplo. 

Note: Sometimes I write Limplo, sometimes LIMPIO, 
they mean the same. 

"^ notVlinked, continue, do nothing. 
We keep applying these rules until no change Is observed. In 
this way, we have eliminated several suisplcious regions. 

In SFBEAD, the suspicious regions were 35, 18, 34, 2, 3, 
12, 11, 33, 37, 47, 48, 46. :48 is known to be the background 
(that was done in page ixC*) > soit is no longer suspicious. >18 
is a neighbor of the background (i48), and got cleaned In the 
page before this one. 

:I1 Isainked with the LIMPIO : 9 and with the suspicious :3. 
Therefore, :11 changes to LIMPIO. 

:3 is Id inked vlth the Limplo :11, so the suspicious :3 be- 
comes Limplo. 

:12 is blinked to the Limplo ilO, and gets cleaned. 
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i46 Is b'llnked to the background !48, and gets made 
background, since !46 is not, at this moment, a neighbor of 
background . 

:34 istlinked to the background i48, and gets made 
background, since : 34 is not a neighbor of background. 

!37 isk-linked to the XIMFIO region t4, and transforms 
into LIMPIO. 

t35 isHinked to the region :34, which is background, 
30 that the suspicious region t35 becomes background instead /kiuaJtTiS 

l2 is a suspicious region blinked to the region :35, which 
is part of the background. According to our rules, !2 becomes 
part of the background, -z*! «&> tiiKkai A -He itaetnuxd .-ifg. 

At the end, only regions :33 and t47 remain suspicious: 
(SUSPICIOUS ARB (:33 i47)) 

We collect all these 'stubborn' suspicious regions and label 
them background, except those vdiich are neighbors of background. 
A better procedure may be to make the exception in 
those regions that are neighbors of suspicious re- 



SUG6ESTI0N 



gions. That is, two neighboring suspicious regions prevent 
each other from becoming background. I have iiot explored 
this possibility. 

In the exan^le SPBEAD, :33 and :47 are made background. 

= If no region is background at this point, make ane of the "big- 
faces" background. There Is room here for inqprovement . 

•" If no background yet, make background the region \with most 
vertices. This is not yet implemented. 

In our example, the (final) background regions aret 
i33 j47 {35 i34 i2 :48 i46. <— BACKGROUND OF 'SPREAD'. 
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Other examples of background finding. 
Scene CORN 

L. L. L \' A 

K ij ij r 

S l_ O ^ '7 L l'^ I. K A I u "' 
r ( I-' - ^7 t : . CL K M I U ^< 

i\ C /. I C 

b C f< rf C ~ 1 .\ 5 F "* b A C A b , ' vl U ' •' L 3 Or '. ' J ~ IN 

(dJS-MCIOL'S Arri >Jii_) 

1 '^ L c' H C A LI -> ..} J i , L L' - L, 1^ '^ I -, i S 

i -'22 ) 




Scene BRIDGE 



( i o CJ i :d tH I l- r 4 I ; hi I 

( S o b P I C 1 C u S « 1^ f; '--nL 

I l-£ O K C < L-. ^ JUI>i Li 

( S3Q ) 
( s 3 ) 
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Scene MOMO 



. One mistake (:31) is produced here. 



L. L. t ^J '^ 

KuuK 

Slut j r. !-^ t " A i u f^ 

T / H ^ 'J c'J t f^ /^ I u ^^ 

'-1 « f :: S 

"- :: < I r: 

i:: *i '<C-i 1 ;\ii F u t^ t A C l^ ij« L' J v u S u»^' ''li'T'C 

( o w S f^ 1 L. i b S Ah- I ! 3 i ) ) 

I '-'£ r. fi C.i^ G~<Ov. 'Ni r C^ "'iJ'-'O 111 
( i 6 s 3 1 s " ) 




FIGURE —'MOMO.' 
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The problem Is ambiguous j^ike la the case of body Isolation (section 
'The Concept of a Body'), the problem of determining the regions that 
belong to the background of a scene (regions that belong to no body) 
is ambiguous; many solutions are possible, as long as no two back- 
ground regions are contiguous. 

Among the multitude of solutions there exists a preferred one, 

which is "the" standard (common, familiar) interpretation chosen 
by people. 

Our program tries to choose also, among the many solutions, 

the standard one. 

Summary 

^^^^— A lenient algorithm finds regions (by analyzing the types of 

their vertices, and their neighborhood relations) that nay possibly 
be background, and labels them "SUSPICIOUS". With the Idea of 
re-classifying the suspicious regions as 'LDfPIO' (clean, no back- 
ground) or 'BACKGROUND', a system of b- links is introduced. These 
b- links provide more global Infomatlon aboii^ the scene. 

Members of the suspicious set are assigned to one of the other 
two sets (t«ifio"'k«9««w<;^ while the algorithm tries to minimize the b- links 
between Background and Llmplo regions . 

^— •— i— ^ Fair results are obtained with the algorithm just 
described. Sometimes, regions are obtained as Background that 
are genuine components of a body ("Llnplo") and vice versa. 

Refinements are needed, but since in our present vision experi- 
ments the background is a homogeneous black area (see first few pic- 
tures of this thesis) , no emphasis is shown right now. 
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STEREO PERCEPTION 

Summary g^ ^^^ ^^ ^^^ discussed the Identification of objects in a 
scene and ignored the problem of locating them in a three-dimensional 
space . 

There are several ways to achieve this. We will discuss here one 
of them: the use of more than one view of the same scene. 

A natural first step is to establish the correspondence between 
points in the two views; that is, given a point in one scene (left), 
to find the corresponding point in the other scene (right). Theorems 
S-1 below and S-2 on page 
234 express criteria 
for ftds "stereo matching". 



THEOREM S-1 

If both cameras are Identical, their optical 
axes parallel and the films or sensiti- 
ve surfaces or retinas lie in the same 
plane , 

then a simple necessary condition for two 
image points, one in each retina, to 
have come from the same 3-dim point , 
is that both image points (left and 
right) have the same y-coo^ 
dinate , 
measured in the direction perpendicu- 
lar to the line joining the optical 
centers. 



SEE can independen- 
tly decoii^)Ose the left 
and right scene into the 
bodies forming them, leav- 
ing as a problem to de- 
termine which of the ob- 
jects in the right scene 
corresponds to an object 
in the left scene. This can be done because each object will appear 
in both views with the same maximum height and minimum height (highest 
and lowest values of the y-coordinate of points belonging to that 
object) ; comparisons are easily made by replacing the objects by 
"intervals" consisting of these two numbers. 

Further disambiguation can be achieved by the use of the function 

(WHERE X. Y. X_ Y„), which determines the (x, y, z) 3-dim position 
L L R . R 

of a point of which its two 2-dim locations (Xj^, Y^) and (X^, Y^^) 
are known. {Griffith, AI Memo 143}. 
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Figure 'P I H T S' 

Given two images of the same scene, before 
we can proceed to situate it in 3-dim space, 
it is necessary to know which points of the 
left scene correspond to points of the right 
scene: we have to discover the genuine pairs 
in it, a small subset of the cartesian pro- 
duct (a, b, c, d)X (e, f, g, h). It is 
desirable to have an algorithm that avoids an 
exhaustive search on this product. 



Genuine Pair (definition). A pair of points (P_ , P ) produced by a 
real 3-dim point of the scene in consideration. 

Theorem S-2 below gives conditions that a genuine pair must meet. 
A particularization will produce theorem S-1 above. 



THEOREM S-2 



The left image P and the right image P of a point P 
Ii R 

have associated with them a variable, computable from 

(Xj^, Yj^) or from (X^, Y ) , that will acquire the same 

value on P and on P . It is invariant under change 
L R 

of scene. 

For the case where the optical axes are parallel, 



this variable is simply the y-coordlnate (Y. 



V" 



height of the image. 

For the case where the optical axes meet, this 
variable is Yt an angle that plane P, -C_ -P-C_-P_ makes 
with r* , the plane containing the optical axes. 

Any monotonic function of y will be Just as good, 
(cf. figure 'GENUINE PAIRS'). 



From the theorem, the algorithm (referred to in fig. 'POINTS') that 
we may use to establish correspondence between points in the two 
views is: 
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Compare only points with the same y 
(or the same y-coordlnate) . 

Points with different y can not 
come from a genuine pair. 

For each body, the knowledge of the 3-dlm location of a few of Its 
vertices will be sufficient to position that body In real space, 
achieving In this way the goal of this section. 

See Digression 1 In section 'The concept of a body' , for a 
different approach. 






ao- 


!■- 


y> 




a' 


R ^ 


^ 


^T^ 




t-.o' 






■ . 










..-■■'■' 


'sr 




-z 


^ — 





'B 



r=-/5" 



r.-i5' 



Figure 'y-PARAMETRIZATION' 

From geometrical considerations and the coordinates of a 
point Pj^ in L, it is possible to attach to the line A-Pj^ 
an angle y. Sinulactij, an angle is obtained for lines of R. 
It can now be said that a genuine pair (Pl, Pj^) must 
have the' same y's for P^ and Pr,. 

Y is a physical quantity, namely the angle that 
the plane passing by the image P^ and the optical 
centers Cj^ and C^ makes with the "horizontal" plane T . 
(f contains the optical axes). Clearly, for P^ and 
Pr to be produced by a point P in 3-dim space, the y 
Of P-^ must be equal to the y of Pr. This is a necessary 
condition that is easy to check. 

A real point P of the scene produces a left image P (which has 
a certain value of y) and a right image P with the same value of y 
(figure 'y-PARAMETRIZATION ' ) . 

Thus , given a point in one scene , we 
have to search for its genuine pairs 
in the other scene among the points 
with its same y- They will be found 
along an straight line through A or B. 

Parametrization of the scene is possible not only by using YJ 
a monotonic function of y will do. 

For computational efficiency, it may be advisable to store the 
points of the scenes into arrays according to the value of their y's. 
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The function LINE maps points of 1 Into lines of R. 
An image point P may have come from different 3-dim points P, P' , P' . . . 
all of them situated in the line of sight of P . The right images 
of P, P', P", ... all fall in a straight line which is the intersection 
of the shaded pline [called plane VV^'V^R *^° ^^^- ''^^"^"® ^^^" ^ 
and the right retina. 
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Mhen the optical axes are parallel , ^. ^ . . , 

In this case, points A and B on 

line Cj^-Cj^ (fig. 'Genuine Pairs') travel to infinity, and lines P -A 

and Pg-B become horizontal (parallel to a-C). The situation looks 

like 



L 




to. 
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A genuine pair (P , P ) will 
have the same y-coordinate for 
both of Its elements (10.0 in 
this case) . 

So that, given a left image point P , we have to search only 
among the points of R with its sanie height , to find- "the" P that 
will make a genuine pair (P , P ) . 

But several genuine pairs may be found. Because on each hori- 
zontal line on R, many points may lie. 



USE OF SEE IN STEREO PERCEPTION 

We can use the Invariance of the variable described in Theorem 
S-2 to locate objects in three dimensional space, from a pair of ste- 
reo views (we will suppose parallel axes; other case is similarly 
treated) as follows: 

(1) Make an analysis of the left scene with SEE, identifying the 
bodies. 

(2) Id. for right scene. 

(3) Reduce each body to an interval formed by two numbers, its 
maximum and minimum heigjit, specifying "closed" if the absolute 
extremal of the body is known, "open" if not. 

In this way we reduce each scene to a set of Intervals (see 
figure 'INTERVALS'). 
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Each body Is reduced 
to an interval. 

(4) Use these Intervals to select which left body will go with what 
right body. The answer is 8iiiq)le (because it is unique) even 
in moderately crowded scenes. 

It is simple to take into account the fact that an open 
end of an Interval indicates that the interval can extend 
further at such end. 

Sources of difficulties are: 

(a) Two bodies have the same interval, meaning they have identical 
maximum heights and minimum heights. This Is possible. 




Quite easy: reduce some faces to intervals and con^are them. 

(b) A body is seen in left scene but not in right scene (figures 
U2, R12). 

(c) SEE partitions one body In two in one scene, but not in the 
other. 

The "open" and "close" indications will help here. 
Also, remember that we are using, when cwn^aring these intervals, 
just a very small part of the total Information concerning each body. 
When the selection is narrowed down to two or three candidates 
["left-body 1 is either right-body 2 or right-body 5 "] , one can use 
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(1) the WHERE function of Griffith (op cit) , 

(2) as in (a) above, the intervals for each face of the 
objects, so as to chose as "genuine pair" those two 
objects with more agreement in the intervals of their 
faces ; 

(3) perhaps a face of unxisual shape is enough for discri- 
mination, if it appears both in left and right scenes, 
or the number of vertices below the center of gravity. 



summary 

In summary, I should like to point out that, while much 
has been stated within the somewhat constricting frame- 
work of this article, much remains to be stated. Certain, but 
not all, important classes of presentations have been 
treated, and there remain horizons as yet unexplored. Con- 
ceivably, the author will attempt, ex ni/i<b ntfitf fit, to estab- 
hsh a more general perspective in the course of a subse- 
quent article. (rm, j,«,_ uUmJa^m M»i «). ■ 

Also, the reader is referred to other 
articles on the same topic. 



243 




244 




245 



Scene LlO " RlO ^^ analyzes Independently (pages |I9 and UV) the left 

and right scenes, obtaining the following bodies: 

(BODY I, IS 15 11 t4 tl2) uji gcaSHE (LlO) 

(BOUY 2. IS t6 MS t7 *11.M4) 

(bOUY 3. IS SB 19 UO *3) 

(BUCY 4. is 12 113) 

(BOUY U IS X<3 Xt5 XS6 XtiA) 

RIGHT SCKMK (RIO) 'BODY 2. IS Xtl3 Xtl Xlll Xl9 Xll5) 

(BODY 3. IS XtS X*2 XUO) 

(BODY 4. IS Xt4 Xt7 X*12> 

For each of the eight bodies, we confute its minimum height and its 
maximum height, obtaining the following Intervals! 

LlO RIO 

.•5 ;i :4 :12 -►[ee.lOS) 167,154]^ x:3 x:5 x:6 XJU 

:6 M5 :7 sn Si4-[79,120] [73,119] xxl3 Xtl X«ll X«9 XMS 

18 :9 :iO :3 ^[68.152] [65,103) _ x:8 X.2 XUO 

S2 »i3 —^[21,82) [22,82)^ x«4X»7X»l2 

These Intervals are compared (left with rifht) , trying to find 
pairs with discrepancies between their values tolerably sauill [if the 
interval has an open end, differences can be larger]. For 'LlO - RIO', 
these are 

[66,105) - [65,103) 
[79,120] - [78,119] 
[68,152] - [67,154] 
[21,82) = [22.82) 
that corresponds to the following identification of bodies; 

S5 tl t4 $12 corresponds to x:8 XS2 XtlO 
S6 !15 :7 :u Ji4 corresponds to Xxl3 X»l Xtll X«9 X«15 
:6 i9 :iO :3 corresponds to Xt3 XtS %s6 X;i4 
S2 tl3 corresponds to XS4 X<7 XS12 

Once these correspondences between objects in the two Images Art 
found, the function (WHERE ...) {Griffith} will position these bodies 
in three-dimensional space, achieving our goal. 
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CONCLnSIONS 



LOOKING BEHIND 



When I started to work on these problems , the idea was to 
describe an object by using a model, and with this model in memory, 
to search the scene looking for sub-parts of it that would fit the 
description. 

This work ended (as far as this thesis is concerned) with a 
program that finds bodies without having a model of them. 

But that is good. 

We did not know at the beginning that this could be done. 



LOOKING AHEAD 

a. Suggestions for further work 

b. Comments 

c . Re c ommenda t ions 

d. Summary 

e. Conclusions 

f. Evaluation 

g. Extensions and Implications 



All these matters are 
normally encountered 
grouped in a chapter 
at the end of the work 



I can only partially lump all these important matters in one 
final section; many times I cite them in context, that is, next to 
the figure or subject that evokes them, or with which they are most 
closely related. As a result, they are spread through the body of 
this dissertation. 

Also, 
(1) The box I suggestion] appears through this thesis near a 
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partially unsolved or partially formulated problem, and/or its 
partially outlined or partially new solution. 

(2) In page T5<o there is a list of such suggestion boxes. 

(3) The remaining portion of this section and, in general, the 
sections close to the end of this work, abound in statements 
of type ^a.) through (g.). 

(4) 1 have tried to start each section with a brief , and end it with 
a summary or conclusion . 

(5) The section 'Introduction' (page 10 ) specifies the problems 
treated in this thesis, and the section 'Preliminary view of 
Scene Analysis' (page |f ) produces a general view of available 
methods. 
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SUGGESTION 




General notation _ ^ ^ ,^ . 

— — — — — ^— — To put, remove, etc,, links > we 

may develop a notation that will look like 

(WHEN A (Y A) (B il C j3 D j2) 

D (K ( A F ..)) (A t3 E l4 F i2) 

THEN 

PUT LIHK KIND 3 j3 ;4 

NO LUK :1 i2 ) 

"When A Is a vertex of type 'Y', and 
Is a vertex of type 'K' , and 
A and D are joined as specified, 
then 

put a link of kind 3 between region !3 and :4, and 
do not put a link between :2 and ;I," 

The general notation is 

(WHEN F E E') 
"when predicate P Is satisfied, evaluate expression E (execute 
E), otherwise execute E' (v^lch may be missing)". 

In this notation, the predicate F corresponds to a geometric 
pattern or configuration, and the expressions E and E' to the esta- 
blishment or removal of links. 

In SEK, this part Is handled by LISP functions (hand-coded), 
one for each particular heuristic. The suggestion Is to develop this 
general notation, and an interpreter for It. This will speed up 
programming and checking, but will slow down the execution to 
some extent. 

Use 

— — The main use of the new notation or language Is for trying 

new heuristics. Actually, It Is not difficult to hand-code the 
new heuristic In LISP (see function EVEBTICES In listings), because 
everything reduces to calls to NOSABO, THROUGHTES, GEV, SUME, etc. 
I was thinking that a slnqtle MACRO of Lisp could transform from no- 
tation (WHEN PEE') to LISP functional calls. 

Since what the notation or language is really doing is expressing 
as a linear string a two-dimensional configuration P" , a more am- 
bitious project would be to use the light pen and draw this configuration, 
and then have our Interpreter or coiiq>iler produce the LISP program. 
This may look a little like i»ffiIT-G fChristensen} . 
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Assigning a name to an object 

Problem . SEE has separated a scene Into bodies. What are they? 
Is there a pyramid among them? Where are the parallelepipeds? 

To answer this, information can be supplied to the program. In 
the form of a symbolic description or model of the object we are 
trying to find. A model Is an Idealized account of a class of objects, 
all receiving the same name, like "triangular pyramid" or "house". 
Models may have parameters that acquire values after a given Instance 
of the model has been found in a scene. Examples are "height" or 
"length of bottom side". 

Some programs that follow the above procedure to name objects 
In a scene are described and discussed in a Master's Thesis {Guzman}. 
There are difficult problems to be solved if we are to make the 
system able to recognize occluded objects in many situations. 

One could, of course, bj^ass SEE and look for particular objects, 
as it Is done by Polybrlck {Hawaii 69}, a program that finds paralle- 
lepipeds. 
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Do not use over- specialized assumptions. U se more Information 

^^^^^^^^^^^~~^~~^~-^— — — ^— — ^— — ^-i^— ^^-— In 

trying to solve a problem, people will apply quite different methods. 
They may also suppose quite different assumptions, some of which 
may not hold. Due to particular experience, environment, preferen- 
ces, etc., some subjects may be using over-specialized assumptions, 
Instead of requesting more data, more information to solve the 
problem. We may bias our views and risk arriving at conclusions 
(of the "common sense" type) which are valid only on restricted 
segments of populations, or in particular conditions or situations. 
Holes. For instance, if most of the readers of this thesis [technical 
specialists, who have learned to read, are interested in graphical 
processing and computers, etc; who may not be considered a repre- 
sentative cross-section of Homo Sapiens] perceive "objects" a, b 
and c of' figure 'HOLES' as holes {Winston}, we may be tempted to 
conclude that this is a general property, and rush to write a 




Fig. 'holes' 
The >c«<ecv that objects a, b, c 
have to be Interpreted by all 
men, and hence by a program, as 
holes in the larger box, is 
dangerous, {cf. AI Memo 163} 

subroutine to find such orifices. Perhaps other sectors of our 
population would simply say, with respect to a, b, c, of figure 
'HOLES' that "there is not enough information to make a decision" 
(see also section 'On optical illusions'). Or they may come with 
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different answers, using thel£ set of assuoptlons which may be 
different from ours, since their experience Is different too. 
The Ames' Room (see Box, page %oi) and Gregory (see Box) warn us 
of this. 



Other example of over-specialization 

' For people familiar with 

Descriptive Geometry, It is easy to see that figure 'DESCRIPTIVE' (I) 

shows a straight line in the first octant. For them, indeed, it 

Is easy to visualize this line in three dimensions and have a fairly 

good Idea of its position and orientation in space, just from 

figure (I). 

Other persons would need a more conventional fLgure, such as 
figure 'DESCRIPTIVE' (II), to visualize the same line, to get the 
same idea. 

What happened was that the first group of persons were using 
especialized knowledge, their mind were trained, figure (I ) was 
familiar to them, etc. 





Figure 'DESCRIPTIVE' 



(ID 



Conclusion 



Before looking for heuristics and shortcuts, before making 



assumptions, deductions, etc., let us be sure that there is enough 
data to solve our problem. 
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Human perception versus computer perception „. ,. , . 

I I Given a two-dimensional 

line-drawing of a three-dimensional scene, the problem of finding 

bodies In It Is Inherently ambiguous t many 3-dlm scenes can generate 

the same 2-dlm scene. 

Multiple solutions are possible. More over, the metatheorem 
of page yj guarantees that a solution always exists, and provides 
ways to construct it. We call this solution "trivial"} In effect it 
is trivial to write a computer program that will Invariably find it. 

From the multitude of possible solutions, human beings select 
one, which is * different from the trivial, and call it "normal" 
or "common" or "standard" or "reasonable" interpretation of the 
scene . 

Our program SEE also selects one of the many solutions. 
How does Its selection compare with the human choice? 

== When the scene is "clear", in the sense of evoking human 

unanimity, SEE will * also select that same answer. Example: 
Figure 'TOWER'. 

■^ As the scene or drawing gets complicated or ambiguous, mortal 

behavior deteriorates; opinions split, optical illusions voay tmenje 

(indicating contradictory evidence perceived), several 
plausible answers are emitted. 

The answer of SEE in these cases will * be found among the 
humanly plausible selections. In some cases, it may not agree 
with the majority. 

== Finally, people make mistakes. They will see an object that is 
not there, or will fail to see an object, or classify it as 
"In^ossible". 

But SEE also errs. It sometimes succeeds where people fail, 
more often it is the other way around. 



* 

In an overwhelming majority of cases. 
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TABLE "ASSUMPTIONS" 



ASS1]MFTI0NS MADE B7 THE FSOGEIAM 



These assux^Jtlons have to be obeyed for SEE to give good results: 

-■ The objects are three-dimensional solids formed by planes 
No needles or cardboards allowed. 

— They produce a two-dimensional image or projection where all 

(2) 
lines are straight^ '. 

— Faces have no drawings, marks, labels, etc., imprinted on. 
-■ Objects do not have holes in them. 



^ See section 'On optical illusions' for conditions for partial 
lifting of this assumption. 

^ See section 'On curved objects' for conditions for partial lifting 
of this assumption. 



ASSUMPTIONS HOT MADE B7 THE PBOOCAM 

These asstiiQ>tions are not necessary for the correct functioning of SEE; 
it will work well with or without them. 

— Only prisms are allowed. 

■" The scene is a parallel projection, or isometric drawing. 
"" The objects are convex. 

— The model or description of the object has to be known to SEE. 

•— The objects have to appear unoccluded or unobstructed in the view. 

— The objects have "weight" in the vertical direction and will 
fall if not supported. 

— The background Is known in advance (See 'On background discrimi- 
nation by computer'). 

I repeat, these assumptions are MOT obeyed by our program. 
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AimOTATED LISTING OF THE 7I]IiCTI(»IS USES 

Tou do not have to know these things In order to use SEE (res- 
ding 'How to use the program' in page 7f is enough) or to understand 
what it does (It Is explained In 'SEE, a program that finds bodies in 
a scene', pageff ); these things are put here Merely for coa^leteness 
and to make easier the understanding of the imMr workings of SEE. 

A list ing is a formal description _. 

There is a stronger reason, 

however. A listing of the programs is a formal description, an 

algorithm, an exact statement in a formal language of what we may 

have been describing, perhaps Inaccurately, In a natural language 

(English). It becomes the starting point of serious discussions. 

The reader vho Is skeptical at some point, or did not understand 

some English statement, can always clarify his doubts in the listing. 

To be understandable, the listing has to have annotation's, comaents. 

A mathematician is hot forced to explain his work always in na- 
tural language, but rather he is allowed to en^loy abstract notations, 
symbolisms, fo.n^li,utlons of his thouahts (indeed, it is preferable 
this way) . A progranmier should not hide his listings (he should not 
be forced to re<state his algorithms in natural language exclusively 
{ 68}) and force his readers to use the ambiguous channels 
of his natural language conmninlcation. 

And this brings another point. Hot only a programswr should not 
hide the listing (unless there are'!!bugs or incMnplete subroutines), 
but. he should not, hide .thje proRrams (unless they are banal); by this 
I mean honest and reasonable efforts should be made to facilitate fu 
ture potential users the access to these programs . Include t 

™ DociMentation 

»■ Listings, tape or card deck nasws, etc. 

>» Test data 

— Frintout of an interaction with sueh test data, 

including loading, coi^ilatlon, execution, results. 

" Time spent (by machine and by man) . 

See also R. Kaln's letter {C. ACM March 67}. 
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